Project

General

Profile

Actions

Bug #14963

closed

bhyve destroy should be more robust

Added by Patrick Mooney 3 months ago. Updated 2 months ago.

Status:
Closed
Priority:
Normal
Category:
bhyve
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

As pointed out in #14931, destruction of bhyve instances could be more robust, both in being responsive to destruction requests, and at handling edge cases when an instance is destroyed while being referenced by other entities (processes, viona consumers, etc).

A few noted issues:
- Destroy ioctls blocking on a viona vmm_drv hold which is not being promptly released. (The viona fd itself must be closed to release the hold, which may have not reason to occur if the bhyve process is stalled on something else)
- Instance vCPUs are not forced to exit guest context. Since the destroy request does not take instance-related locks which would otherwise kick the vCPUs out, they may remain blocked in guest context, or in another sync point (like hlt) for an extended period of time.
- The existing logic does not appear to be careful enough around an actor in the GZ holding references to an instance being destroyed as part of zone destruction. While this is a rare edge case, it'd be nice for it to be 100% safe as well.


Related issues

Related to illumos gate - Bug #14931: viona_ring_rx should be more conscious of drv purgeClosedMichael Zeller

Actions
Related to illumos gate - Bug #14968: bhyve needs self-destroy ioctlClosedPatrick Mooney

Actions
Actions #1

Updated by Patrick Mooney 3 months ago

  • Related to Bug #14931: viona_ring_rx should be more conscious of drv purge added
Actions #2

Updated by Patrick Mooney 3 months ago

  • Description updated (diff)
Actions #3

Updated by Electric Monk 3 months ago

  • Gerrit CR set to 2343
Actions #4

Updated by Patrick Mooney 2 months ago

  • Related to Bug #14968: bhyve needs self-destroy ioctl added
Actions #5

Updated by Patrick Mooney 2 months ago

To account for the new behavior from #14968, the tests around destruction (both self-, and otherwise) were expanded:

Test: /opt/bhyve-tests/tests/mevent/vnode_zvol (run as root)      [00:02] [PASS]
Test: /opt/bhyve-tests/tests/inst_emul/rdmsr (run as root)        [00:00] [PASS]
Test: /opt/bhyve-tests/tests/inst_emul/wrmsr (run as root)        [00:00] [PASS]
Test: /opt/bhyve-tests/tests/inst_emul/triple_fault (run as root) [00:00] [PASS]
Test: /opt/bhyve-tests/tests/kdev/vatpit_freq (run as root)       [00:00] [PASS]
Test: /opt/bhyve-tests/tests/kdev/vhpet_freq (run as root)        [00:00] [PASS]
Test: /opt/bhyve-tests/tests/kdev/vlapic_freq (run as root)       [00:00] [PASS]
Test: /opt/bhyve-tests/tests/kdev/vlapic_freq_periodic (run as root) [00:00] [PASS]
Test: /opt/bhyve-tests/tests/kdev/vlapic_mmio_access (run as root) [00:00] [PASS]
Test: /opt/bhyve-tests/tests/kdev/vlapic_msr_access (run as root) [00:00] [PASS]
Test: /opt/bhyve-tests/tests/kdev/vpmtmr_freq (run as root)       [00:00] [PASS]
Test: /opt/bhyve-tests/tests/mevent/lists_delete (run as root)    [00:00] [PASS]
Test: /opt/bhyve-tests/tests/mevent/read_disable (run as root)    [00:00] [PASS]
Test: /opt/bhyve-tests/tests/mevent/read_pause (run as root)      [00:00] [PASS]
Test: /opt/bhyve-tests/tests/mevent/read_requeue (run as root)    [00:00] [PASS]
Test: /opt/bhyve-tests/tests/mevent/vnode_file (run as root)      [00:09] [PASS]
Test: /opt/bhyve-tests/tests/viona/interface_version (run as root) [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/auto_destruct (run as root)      [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/drv_hold (run as root)           [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/fpu_getset (run as root)         [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/interface_version (run as root)  [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/legacy_destruct (run as root)    [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/mem_devmem (run as root)         [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/mem_partial (run as root)        [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/mem_seg_map (run as root)        [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/self_destruct (run as root)      [00:00] [PASS]

Results Summary
PASS      26

Running Time:   00:00:14
Percent passed: 100.0%

They were retooled to meet the expectations that an instance being destroyed (where its destruction is delayed to a hold from a vmm_drv or file descriptor) remains visible in /dev/vmm/, but rejects requests to perform actions on the vmm instance itself (such as running the vCPUs)

After running the test suite, as well as some "real" guests for smoke-testing, I initiated a dump of the machine in order to check ::findleaks to confirm that necessary clean-up was occurring. The output looked fine.

Actions #6

Updated by Michael Zeller 2 months ago

I did some additional testing on omnios bloody. I setup a bhyve zone to ensure that tear down when the bhyve process is contained within a zone still works.

The zone successfully stopped with the bhyvectl destroy command.

link - zebes ~ $ ptree -z testvm
2608   zsched
  2690   bhyve-testvm -k /etc/bhyve.cfg
link - zebes ~ $ pfexec bhyvectl --vm=testvm --destroy
link - zebes ~ $ ptree -z testvm
link - zebes ~ $ zoneadm list -iv | rg testvm
   - testvm           installed  /zones/testvm                  bhyve    excl

Corresponding zone log file

# tail zone.log
 '30:1,xhci,tablet',
 '-o',
 'config.dump=1',
 'testvm']
INFO:root:/usr/sbin/bhyve -U 99ebf266-79ac-67a0-8900-e0ba4044c7c0 -H -B "1,manufacturer=OmniOS,product=OmniOS HVM,version=1.0,serial=99ebf266-79ac-67a0-8900-e0ba4044c7c0,sku=001,family=Virtual Machine" -c 4 -m 2G -l bootrom,/usr/share/bhyve/firmware/BHYVE.fd,/etc/uefivars -s 0,hostbridge,model=i440fx -s 1,lpc -l com1,/dev/zconsole -s 4:0,virtio-blk,/dev/zvol/rdsk/zvme/testvm-disk0 -s 6:0,virtio-net-viona,alpine0 -s 30:0,fbuf,vga=off,unix=/tmp/vm.vnc -s 30:1,xhci,tablet -o config.dump=1 testvm
DEBUG:root:Created temporary file at /zones/testvm/root/etc/bhyve.vigy4jyh
INFO:root:Successfully created /zones/testvm/root/etc/bhyve.cfg
[September 21, 2022 at 03:06:21 PM UTC] BHYVE support z=testvm p=/zones/testvm c=poststate 2:1
[September 21, 2022 at 03:07:27 PM UTC] BHYVE support z=testvm p=/zones/testvm c=prestate 5:4
[September 21, 2022 at 03:07:27 PM UTC] BHYVE support z=testvm p=/zones/testvm c=poststate 5:4

Actions #7

Updated by Electric Monk 2 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 42640e499ab380f595753ffb5c3607d030e5cff3

commit  42640e499ab380f595753ffb5c3607d030e5cff3
Author: Patrick Mooney <pmooney@pfmooney.com>
Date:   2022-09-26T20:06:16.000Z

    14963 bhyve destroy should be more robust
    14968 bhyve needs self-destroy ioctl
    Reviewed by: Andy Fiddaman <illumos@fiddaman.net>
    Reviewed by: Michael Zeller <mike@mikezeller.net>
    Approved by: Dan McDonald <danmcd@mnx.io>

Actions

Also available in: Atom PDF