Project

General

Profile

Actions

Bug #14423

closed

bhyve vm can wedge on shutdown

Added by Patrick Mooney 7 months ago. Updated 7 months ago.

Status:
Closed
Priority:
Normal
Category:
bhyve
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Ian from the SmartOS community reported sporadic cases of bhyve instances getting stuck while in the shutting-down state.

We started digging in with mdb -k, finding the bhyve process still present in the zone, but not otherwise responsive. A walk the process kernel stacks was illuminating:

stack pointer for thread fffffe2352f16800 (bhyve/3 [blk-4:0-1]): fffffcc26dedd390
[ fffffcc26dedd390 _resume_from_idle+0x12b() ]
  fffffcc26dedd3c0 swtch+0x133()
  fffffcc26dedd460 turnstile_block+0x25b(fffffe2352ef1020, 0, fffffe2316fda368, fffffffffbc1c5e0, 0, 0)
  fffffcc26dedd4d0 rw_enter_sleep+0x1a5(fffffe2316fda368, 0)
  fffffcc26dedd550 segvmm_fault+0x70(fffffe232afed248, fffffe2354b72a90, fffffbfd810c5000, 1000, 0, 2)
  fffffcc26dedd640 as_fault+0x29c(fffffe232afed248, fffffe2315933470, fffffbfd810c5000, 1, 0, 2)
  fffffcc26dedd6c0 pagefault+0xda(fffffbfd810c5000, 0, 2, 0)
  fffffcc26dedd7b0 trap+0x1139(fffffcc26dedd7c0, fffffbfd810c5000, 24)
  fffffcc26dedd7c0 cmntrap_pushed+0x3c()
  fffffcc26dedd8d0 bcopy_ck_size+0xea()
  fffffcc26dedd940 uiomove+0xbe(fffffdc3f191c000, 1600, 0, fffffcc26deddc88)
  fffffcc26dedd9e0 dmu_read_uio_dnode+0x11f(fffffe236b903eb0, fffffcc26deddc88, 7600)
  fffffcc26dedda40 dmu_read_uio+0x73(fffffe238da32680, 1, fffffcc26deddc88, 7600)
  fffffcc26deddad0 zvol_read+0x12d(5a00000007, fffffcc26deddc88, fffffe232d6e3330)
  fffffcc26deddb00 cdev_read+0x2a(5a00000007, fffffcc26deddc88, fffffe232d6e3330)
  fffffcc26deddb90 spec_read+0x2ab(fffffe2382752500, fffffcc26deddc88, 0, fffffe232d6e3330, 0)
  fffffcc26deddc30 fop_read+0x111(fffffe2382752500, fffffcc26deddc88, 0, fffffe232d6e3330, 0)
  fffffcc26deddf00 preadv+0x22e(4, cab178, 8, f5eae000, 0)
  fffffcc26deddf10 sys_syscall+0x1a8()
stack pointer for thread fffffe2352dafc20 (bhyve/5 [blk-4:0-3]): 
fffffcc26dee9390
[ fffffcc26dee9390 _resume_from_idle+0x12b() ]
  fffffcc26dee93c0 swtch+0x133()
  fffffcc26dee9460 turnstile_block+0x25b(0, 0, fffffe2316fda368, fffffffffbc1c5e0, 0, 0)
  fffffcc26dee94d0 rw_enter_sleep+0x1a5(fffffe2316fda368, 0)
  fffffcc26dee9550 segvmm_fault+0x70(fffffe232afed248, fffffe2354b72a90, fffffbfd81207000, 1000, 0, 2)
  fffffcc26dee9640 as_fault+0x29c(fffffe232afed248, fffffe2315933470, fffffbfd81207000, 1, 0, 2)
  fffffcc26dee96c0 pagefault+0xda(fffffbfd81207000, 0, 2, 0)
  fffffcc26dee97b0 trap+0x1139(fffffcc26dee97c0, fffffbfd81207000, b)
  fffffcc26dee97c0 cmntrap_pushed+0x3c()
  fffffcc26dee98d0 bcopy_ck_size+0xea()
  fffffcc26dee9940 uiomove+0xbe(fffffdc38aba4000, 600, 0, fffffcc26dee9c88)
  fffffcc26dee99e0 dmu_read_uio_dnode+0x11f(fffffe236b903eb0, fffffcc26dee9c88, 4600)
  fffffcc26dee9a40 dmu_read_uio+0x73(fffffe238da32680, 1, fffffcc26dee9c88, 4600)
  fffffcc26dee9ad0 zvol_read+0x12d(5a00000007, fffffcc26dee9c88, fffffe232d6e3330)
  fffffcc26dee9b00 cdev_read+0x2a(5a00000007, fffffcc26dee9c88, fffffe232d6e3330)
  fffffcc26dee9b90 spec_read+0x2ab(fffffe2382752500, fffffcc26dee9c88, 0, fffffe232d6e3330, 0)
  fffffcc26dee9c30 fop_read+0x111(fffffe2382752500, fffffcc26dee9c88, 0, fffffe232d6e3330, 0)
  fffffcc26dee9f00 preadv+0x22e(4, caca38, 5, f6458000, 0)
  fffffcc26dee9f10 sys_syscall+0x1a8()
stack pointer for thread fffffe2352d89060 (bhyve/8 [blk-4:0-6]): fffffcc26defbd30
[ fffffcc26defbd30 _resume_from_idle+0x12b() ]
  fffffcc26defbd60 swtch+0x133()
  fffffcc26defbda0 cv_wait+0x68(fffffe235c4331fe, fffffe22d6c56d40)
  fffffcc26defbde0 exitlwps+0x134(0)
  fffffcc26defbe60 psig+0x4fb()
  fffffcc26defbf00 post_syscall+0x75d(4, fffffe238904c780)
  fffffcc26defbf10 0xfffffffffb800d1f()
stack pointer for thread fffffe2385517420 (bhyve/11 [blk-4:0-9]): fffffcc26dd39330
[ fffffcc26dd39330 _resume_from_idle+0x12b() ]
  fffffcc26dd39360 swtch+0x133()
  fffffcc26dd393a0 cv_wait+0x68(fffffe2361533ab0, fffffe2361533aa8)
  fffffcc26dd393e0 vmc_activate+0x4b(fffffe2361533a90)
  fffffcc26dd39450 vmc_hold+0x5b(fffffe2361533a90, 59db000, 3)
  fffffcc26dd394d0 segvmm_fault_space+0xa4(fffffe232afed248, fffffe2354b72a90, fffffbfd85ddb000, 1000)
  fffffcc26dd39550 segvmm_fault+0xfd(fffffe232afed248, fffffe2354b72a90, fffffbfd85ddb000, 1000, 0, 2)
  fffffcc26dd39640 as_fault+0x29c(fffffe232afed248, fffffe2315933470, fffffbfd85ddb000, 1, 0, 2)
  fffffcc26dd396c0 pagefault+0xda(fffffbfd85ddb000, 0, 2, 0)
  fffffcc26dd397b0 trap+0x1139(fffffcc26dd397c0, fffffbfd85ddb000, 23)
  fffffcc26dd397c0 cmntrap_pushed+0x3c()
  fffffcc26dd398d0 bcopy_ck_size+0xea()
  fffffcc26dd39940 uiomove+0xbe(fffffdc34b31ae00, 1200, 0, fffffcc26dd39c88)
  fffffcc26dd399e0 dmu_read_uio_dnode+0x11f(fffffe236b903eb0, fffffcc26dd39c88, 2c00)
  fffffcc26dd39a40 dmu_read_uio+0x73(fffffe238da32680, 1, fffffcc26dd39c88, 2c00)
  fffffcc26dd39ad0 zvol_read+0x12d(5a00000007, fffffcc26dd39c88, fffffe232d6e3330)
  fffffcc26dd39b00 cdev_read+0x2a(5a00000007, fffffcc26dd39c88, fffffe232d6e3330)
  fffffcc26dd39b90 spec_read+0x2ab(fffffe2382752500, fffffcc26dd39c88, 0, fffffe232d6e3330, 0)
  fffffcc26dd39c30 fop_read+0x111(fffffe2382752500, fffffcc26dd39c88, 0, fffffe232d6e3330, 0)
  fffffcc26dd39f00 preadv+0x22e(4, cad278, 3, 62bf56e00, 0)
  fffffcc26dd39f10 sys_syscall+0x1a8()

The final thread, attempting a vmc_activate is apparently stuck. Inspection of the vmspace_t and the vm_client_t used by that segment (and the stuck thread) show that its state is VCS_IDLE. This is surprising, considering the associated logic in vmc_activate is waiting for VCS_HOLD to disappear:

static int
vmc_activate(vm_client_t *vmc)
{
        mutex_enter(&vmc->vmc_lock);
        VERIFY0(vmc->vmc_state & VCS_ACTIVE);
        if ((vmc->vmc_state & VCS_ORPHANED) != 0) {
                return (ENXIO);
        }
        while ((vmc->vmc_state & VCS_HOLD) != 0) {
                cv_wait(&vmc->vmc_cv, &vmc->vmc_lock);
        }
        vmc->vmc_state |= VCS_ACTIVE;
        return (0);
}

How would we have missed that transition? Looking at vmc_space_release highlights the issue:
static void
vmc_space_release(vm_client_t *vmc, bool kick_on_cpu)
{
        mutex_enter(&vmc->vmc_lock);
        VERIFY(vmc->vmc_state & VCS_HOLD);

        if (kick_on_cpu && (vmc->vmc_state & VCS_ON_CPU) != 0) {
                poke_cpu(vmc->vmc_cpu_active);

                while ((vmc->vmc_state & VCS_ON_CPU) != 0) {
                        cv_wait(&vmc->vmc_cv, &vmc->vmc_lock);
                }
        }

        /*
         * Because vmc_table_enter() alters vmc_state from a context where
         * interrupts are disabled, it cannot pay heed to vmc_lock, so clearing
         * VMC_HOLD must be done atomically here.
         */
        atomic_and_uint(&vmc->vmc_state, ~VCS_HOLD);
        mutex_exit(&vmc->vmc_lock);
}

When we clear VCS_HOLD from vmc_state, we don't wake the associated CV, leaving the thread in vmc_activate waiting forever. This obvious oversight needs to be fixed.

Actions #1

Updated by Electric Monk 7 months ago

  • Gerrit CR set to 1967
Actions #2

Updated by Patrick Mooney 7 months ago

This specific race was not a common occurrence: Only one person has reported it on SmartOS so far. As such, its reproduction has been difficult. With that in mind, I ran the normal battery of guest smoke tests, as well as the bhyve-tests suite to look for regressions. It all seemed to operate normally.

Actions #3

Updated by Patrick Mooney 7 months ago

Ian reported an intent to test this as well, but noted that the failure was rare, so assuming the issue is addressed, it wouldn't be immediately apparent, as the problem did not occur often.

Actions #4

Updated by Electric Monk 7 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit cc7a5a3be88241742ab4e0d7a14a0291f4e32183

commit  cc7a5a3be88241742ab4e0d7a14a0291f4e32183
Author: Patrick Mooney <pmooney@pfmooney.com>
Date:   2022-01-24T21:03:34.000Z

    14423 bhyve vm can wedge on shutdown
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Andy Fiddaman <andy@omnios.org>
    Reviewed by: Mike Zeller <mike.zeller@joyent.com>
    Reviewed by: Vitaliy Gusev <gusev.vitaliy@gmail.com>
    Approved by: Gordon Ross <gordon.w.ross@gmail.com>

Actions

Also available in: Atom PDF