Bug #14740
opencyclics should panic when juggling for too long
0%
Description
Recently, while developing a change to bhyve, we encountered a situation where a misbehaving cyclic (one that was repeatedly rescheduling itself in the past) caused a system to become unresponsive. When cyclic_juggle_one_to
attempted to remove the misbehaving cyclic so it could be moved to a different CPU, we fell into its exponentially increasing delay loop:
/*
* Remove the cyclic from the source. As mentioned above, we cannot
* block during this operation; if we cannot remove the cyclic
* without waiting, we spin for a time shorter than the interval, and
* reattempt the (non-blocking) removal. If we continue to fail,
* we will exponentially back off (up to half of the interval).
* Note that the removal will ultimately succeed -- even if the
* cyclic handler is blocked on a resource held by a thread which we
* have preempted, priority inheritance assures that the preempted
* thread will preempt us and continue to progress.
*/
for (delay = NANOSEC / MICROSEC; ; delay <<= 1) {
/*
* Before we begin this operation, disable kernel preemption.
*/
kpreempt_disable();
if (cyclic_remove_here(src, ndx, &when, CY_NOWAIT))
break;
/*
* The operation failed; enable kernel preemption while
* spinning.
*/
kpreempt_enable();
CYC_PTRACE("remove-retry", idp, src);
if (delay > (cyclic->cy_interval >> 1))
delay = cyclic->cy_interval >> 1;
/*
* Drop the RW lock to avoid a deadlock with the cyclic
* handler (because it can potentially call cyclic_reprogram().
*/
rw_exit(&idp->cyi_lock);
drv_usecwait((clock_t)(delay / (NANOSEC / MICROSEC)));
rw_enter(&idp->cyi_lock, RW_WRITER);
}
While this loop drops and reacquires cyi_lock
, cpu_lock
is held for the duration. Since the loop was not ultimately successful, contrary to the included comment, this meant that all other operations requiring cpu_lock
were wedged. Defending against the underlying problem (the programmer error of such rescheduling error on the offending cyclic) is not feasible, we could make it easier to debug the system when it happens. Since the system is not going to make much other forward progress if cpu_lock
is held for such an extended period of time (forever in our example case), panicking the system would provide all the clues necessary in the resulting dump to diagnose the issue.
We propose a threshold of ~30 seconds for the delay, after which we initiate the panic. That should be long enough to steer us clear of any legitimate system behavior that falls prey to load or scheduling issues without being too long of a wait if a developer is tempted to physically reboot the system if they're unable to NMI when it goes unresponsive.
No data to display