Bug #14423
closedbhyve vm can wedge on shutdown
100%
Description
Ian from the SmartOS community reported sporadic cases of bhyve instances getting stuck while in the shutting-down state.
We started digging in with mdb -k
, finding the bhyve process still present in the zone, but not otherwise responsive. A walk the process kernel stacks was illuminating:
stack pointer for thread fffffe2352f16800 (bhyve/3 [blk-4:0-1]): fffffcc26dedd390 [ fffffcc26dedd390 _resume_from_idle+0x12b() ] fffffcc26dedd3c0 swtch+0x133() fffffcc26dedd460 turnstile_block+0x25b(fffffe2352ef1020, 0, fffffe2316fda368, fffffffffbc1c5e0, 0, 0) fffffcc26dedd4d0 rw_enter_sleep+0x1a5(fffffe2316fda368, 0) fffffcc26dedd550 segvmm_fault+0x70(fffffe232afed248, fffffe2354b72a90, fffffbfd810c5000, 1000, 0, 2) fffffcc26dedd640 as_fault+0x29c(fffffe232afed248, fffffe2315933470, fffffbfd810c5000, 1, 0, 2) fffffcc26dedd6c0 pagefault+0xda(fffffbfd810c5000, 0, 2, 0) fffffcc26dedd7b0 trap+0x1139(fffffcc26dedd7c0, fffffbfd810c5000, 24) fffffcc26dedd7c0 cmntrap_pushed+0x3c() fffffcc26dedd8d0 bcopy_ck_size+0xea() fffffcc26dedd940 uiomove+0xbe(fffffdc3f191c000, 1600, 0, fffffcc26deddc88) fffffcc26dedd9e0 dmu_read_uio_dnode+0x11f(fffffe236b903eb0, fffffcc26deddc88, 7600) fffffcc26dedda40 dmu_read_uio+0x73(fffffe238da32680, 1, fffffcc26deddc88, 7600) fffffcc26deddad0 zvol_read+0x12d(5a00000007, fffffcc26deddc88, fffffe232d6e3330) fffffcc26deddb00 cdev_read+0x2a(5a00000007, fffffcc26deddc88, fffffe232d6e3330) fffffcc26deddb90 spec_read+0x2ab(fffffe2382752500, fffffcc26deddc88, 0, fffffe232d6e3330, 0) fffffcc26deddc30 fop_read+0x111(fffffe2382752500, fffffcc26deddc88, 0, fffffe232d6e3330, 0) fffffcc26deddf00 preadv+0x22e(4, cab178, 8, f5eae000, 0) fffffcc26deddf10 sys_syscall+0x1a8() stack pointer for thread fffffe2352dafc20 (bhyve/5 [blk-4:0-3]): fffffcc26dee9390 [ fffffcc26dee9390 _resume_from_idle+0x12b() ] fffffcc26dee93c0 swtch+0x133() fffffcc26dee9460 turnstile_block+0x25b(0, 0, fffffe2316fda368, fffffffffbc1c5e0, 0, 0) fffffcc26dee94d0 rw_enter_sleep+0x1a5(fffffe2316fda368, 0) fffffcc26dee9550 segvmm_fault+0x70(fffffe232afed248, fffffe2354b72a90, fffffbfd81207000, 1000, 0, 2) fffffcc26dee9640 as_fault+0x29c(fffffe232afed248, fffffe2315933470, fffffbfd81207000, 1, 0, 2) fffffcc26dee96c0 pagefault+0xda(fffffbfd81207000, 0, 2, 0) fffffcc26dee97b0 trap+0x1139(fffffcc26dee97c0, fffffbfd81207000, b) fffffcc26dee97c0 cmntrap_pushed+0x3c() fffffcc26dee98d0 bcopy_ck_size+0xea() fffffcc26dee9940 uiomove+0xbe(fffffdc38aba4000, 600, 0, fffffcc26dee9c88) fffffcc26dee99e0 dmu_read_uio_dnode+0x11f(fffffe236b903eb0, fffffcc26dee9c88, 4600) fffffcc26dee9a40 dmu_read_uio+0x73(fffffe238da32680, 1, fffffcc26dee9c88, 4600) fffffcc26dee9ad0 zvol_read+0x12d(5a00000007, fffffcc26dee9c88, fffffe232d6e3330) fffffcc26dee9b00 cdev_read+0x2a(5a00000007, fffffcc26dee9c88, fffffe232d6e3330) fffffcc26dee9b90 spec_read+0x2ab(fffffe2382752500, fffffcc26dee9c88, 0, fffffe232d6e3330, 0) fffffcc26dee9c30 fop_read+0x111(fffffe2382752500, fffffcc26dee9c88, 0, fffffe232d6e3330, 0) fffffcc26dee9f00 preadv+0x22e(4, caca38, 5, f6458000, 0) fffffcc26dee9f10 sys_syscall+0x1a8() stack pointer for thread fffffe2352d89060 (bhyve/8 [blk-4:0-6]): fffffcc26defbd30 [ fffffcc26defbd30 _resume_from_idle+0x12b() ] fffffcc26defbd60 swtch+0x133() fffffcc26defbda0 cv_wait+0x68(fffffe235c4331fe, fffffe22d6c56d40) fffffcc26defbde0 exitlwps+0x134(0) fffffcc26defbe60 psig+0x4fb() fffffcc26defbf00 post_syscall+0x75d(4, fffffe238904c780) fffffcc26defbf10 0xfffffffffb800d1f() stack pointer for thread fffffe2385517420 (bhyve/11 [blk-4:0-9]): fffffcc26dd39330 [ fffffcc26dd39330 _resume_from_idle+0x12b() ] fffffcc26dd39360 swtch+0x133() fffffcc26dd393a0 cv_wait+0x68(fffffe2361533ab0, fffffe2361533aa8) fffffcc26dd393e0 vmc_activate+0x4b(fffffe2361533a90) fffffcc26dd39450 vmc_hold+0x5b(fffffe2361533a90, 59db000, 3) fffffcc26dd394d0 segvmm_fault_space+0xa4(fffffe232afed248, fffffe2354b72a90, fffffbfd85ddb000, 1000) fffffcc26dd39550 segvmm_fault+0xfd(fffffe232afed248, fffffe2354b72a90, fffffbfd85ddb000, 1000, 0, 2) fffffcc26dd39640 as_fault+0x29c(fffffe232afed248, fffffe2315933470, fffffbfd85ddb000, 1, 0, 2) fffffcc26dd396c0 pagefault+0xda(fffffbfd85ddb000, 0, 2, 0) fffffcc26dd397b0 trap+0x1139(fffffcc26dd397c0, fffffbfd85ddb000, 23) fffffcc26dd397c0 cmntrap_pushed+0x3c() fffffcc26dd398d0 bcopy_ck_size+0xea() fffffcc26dd39940 uiomove+0xbe(fffffdc34b31ae00, 1200, 0, fffffcc26dd39c88) fffffcc26dd399e0 dmu_read_uio_dnode+0x11f(fffffe236b903eb0, fffffcc26dd39c88, 2c00) fffffcc26dd39a40 dmu_read_uio+0x73(fffffe238da32680, 1, fffffcc26dd39c88, 2c00) fffffcc26dd39ad0 zvol_read+0x12d(5a00000007, fffffcc26dd39c88, fffffe232d6e3330) fffffcc26dd39b00 cdev_read+0x2a(5a00000007, fffffcc26dd39c88, fffffe232d6e3330) fffffcc26dd39b90 spec_read+0x2ab(fffffe2382752500, fffffcc26dd39c88, 0, fffffe232d6e3330, 0) fffffcc26dd39c30 fop_read+0x111(fffffe2382752500, fffffcc26dd39c88, 0, fffffe232d6e3330, 0) fffffcc26dd39f00 preadv+0x22e(4, cad278, 3, 62bf56e00, 0) fffffcc26dd39f10 sys_syscall+0x1a8()
The final thread, attempting a vmc_activate
is apparently stuck. Inspection of the vmspace_t
and the vm_client_t
used by that segment (and the stuck thread) show that its state is VCS_IDLE
. This is surprising, considering the associated logic in vmc_activate
is waiting for VCS_HOLD
to disappear:
static int
vmc_activate(vm_client_t *vmc)
{
mutex_enter(&vmc->vmc_lock);
VERIFY0(vmc->vmc_state & VCS_ACTIVE);
if ((vmc->vmc_state & VCS_ORPHANED) != 0) {
return (ENXIO);
}
while ((vmc->vmc_state & VCS_HOLD) != 0) {
cv_wait(&vmc->vmc_cv, &vmc->vmc_lock);
}
vmc->vmc_state |= VCS_ACTIVE;
return (0);
}
How would we have missed that transition? Looking at
vmc_space_release
highlights the issue:static void
vmc_space_release(vm_client_t *vmc, bool kick_on_cpu)
{
mutex_enter(&vmc->vmc_lock);
VERIFY(vmc->vmc_state & VCS_HOLD);
if (kick_on_cpu && (vmc->vmc_state & VCS_ON_CPU) != 0) {
poke_cpu(vmc->vmc_cpu_active);
while ((vmc->vmc_state & VCS_ON_CPU) != 0) {
cv_wait(&vmc->vmc_cv, &vmc->vmc_lock);
}
}
/*
* Because vmc_table_enter() alters vmc_state from a context where
* interrupts are disabled, it cannot pay heed to vmc_lock, so clearing
* VMC_HOLD must be done atomically here.
*/
atomic_and_uint(&vmc->vmc_state, ~VCS_HOLD);
mutex_exit(&vmc->vmc_lock);
}
When we clear VCS_HOLD
from vmc_state
, we don't wake the associated CV, leaving the thread in vmc_activate
waiting forever. This obvious oversight needs to be fixed.
Updated by Patrick Mooney 7 months ago
This specific race was not a common occurrence: Only one person has reported it on SmartOS so far. As such, its reproduction has been difficult. With that in mind, I ran the normal battery of guest smoke tests, as well as the bhyve-tests suite to look for regressions. It all seemed to operate normally.
Updated by Patrick Mooney 7 months ago
Ian reported an intent to test this as well, but noted that the failure was rare, so assuming the issue is addressed, it wouldn't be immediately apparent, as the problem did not occur often.
Updated by Electric Monk 7 months ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
git commit cc7a5a3be88241742ab4e0d7a14a0291f4e32183
commit cc7a5a3be88241742ab4e0d7a14a0291f4e32183 Author: Patrick Mooney <pmooney@pfmooney.com> Date: 2022-01-24T21:03:34.000Z 14423 bhyve vm can wedge on shutdown Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Andy Fiddaman <andy@omnios.org> Reviewed by: Mike Zeller <mike.zeller@joyent.com> Reviewed by: Vitaliy Gusev <gusev.vitaliy@gmail.com> Approved by: Gordon Ross <gordon.w.ross@gmail.com>