cyclic_remove_here() blocked with cyp_rpend of UINT32_MAX
We've seen several dumps in which the owner of cpu_lock is waiting for the cyp_rpend to go 0:
> ::stacks -c cyclic_remove_here THREAD STATE SOBJ COUNT ffffff19c41957e0 SLEEP SEMA 1 swtch+0x145 sema_p+0x1d9 cyclic_remove_here+0x7c cyclic_remove+0x34 clock_highres_timer_delete+0x37 timer_delete_locked+0x90 timer_delete_grabbed+0x37 timer_delete+0x41 timer_exit+0x29 proc_exit+0x20b exit+0x15 psig+0x582 post_syscall+0x49e
Most suspiciously, the cyp_rpend is -1. More investigation of these dumps is required.
The problem here is manifold. First (and foremost) there is a bug in the
CLOCK_HIGHRES one-shot code in that it uses a mechanism that pre-dates the
cyclic reprogramming to effect the one-shot: it sets the interval such that
the expiration time after the one-shot has fired will be INT64_MAX. The
problem with this is that if the one-shot is re-used, we reset the expiration
time but neglect to adjust this bogus interval; when the one-shot fires, it is
assured to wrap. But because this large bogus interval will be added to the
(now negative) expiration time to determine the new time, this will be
somewhat limited in its ill effect: it will result only in spurious firings.
It should be noted that this condition worsens over time because it takes more
spurious firings to get an expiration time into the future.
The number of iterations is:
abs(now / (INT64_MIN + (oneshot_interval - (INT64_MAX - oneshot_interval))))
But oneshot_interval is set to INT64_MAX - initial_expire – making the
above reduce to:
now / 2 * initial_expire
In the one-shots created earliest in boot in us-sw-1 (created some 192 seconds
after boot), 224 spurious firings per firing would be added per day of uptime;
after 45 days, there would be 10,084 spurious firings per one-shot.
This answers one question – why are the cy_pend counts so generally high? –
but it doesn't answer how our count became -1 (or rather, UINT32_MAX). This
is likely due to the nature of KVM with respect to its one-shots: due to
HVM-711, it is creating one-shots at a potentially frenetic rate; once a timer
begins to fall behind (that is, the one-shot has an expiration that is less
than the amount of time necessary to get the cy_pend count back to zero), it
will fall further behind. This is a positive feedback loop, and it ends only
when the cy_pend count reaches its max of UINT32_MAX.
Finally, knowing that this pathology can induce a cy_pend count of -1, why
does that lead to the hang as seen? This is because of an ancient (and,
admittedly, obscure) bug in cyclics: the code that checks that the cy_pend
count is non-zero does so by assigning the count to a signed int and then
testing that it's greater than 0. This, of course, is incorrect when the
cy_pend count becomes greater than INT32_MAX – and it will result in softint
processing not completing properly.