bhyve vm_suspend should be more flexible
The bhyve kernel component features
vm_suspend in its API: A way to indicate that a VM is to progress towards a shutdown point, either for reset or halt (including those induced by triple-fault induced). This signals all vCPUs to exit from guest context, and mark a bitfield in
struct vm that they have suspended execution. Once all active vCPUs have made a trip though
vm_run to discover the requested suspend and set that appropriate bit, all the threads return with a
VM_EXITCODE_SUSPENDED code. The
VM_REINIT operation (used for clearing VM state to run it as if it was reset) requires that all active vCPUs be in this suspended state before it will execute. If a vCPU thread is in userspace at the time
vm_suspend is issued, it must make a trip through
vm_run before the reinit can proceed. Any vCPU that attempts a
vm_run after it has been successfully suspended (but not yet reinitialized) will receive an error. This makes coordination of vCPU threads and operations attempting to prepare the instance for reset a little complicated.
It would not take a huge effort to make these interfaces easier to use:
1. When performing a
vm_suspend, immediately mark any vCPUs which are not in guest context (or sleeping in a HLT) as suspended
2. Allow already-suspended vCPUs to call back into
vm_run. If all vCPUs are suspended at that time, then
VM_EXITCODE_SUSPENDED will be re-emitted for that vCPU
3. Provide a
wait flag for
vm_suspend, so the ioctl can block until all active vCPUs are marked as suspended.
This should give userspace the tools to more easily navigate the VM and its vCPU threads into a suspended state for halt or reinitialization.
Updated by Patrick Mooney 9 months ago
My thoughts about how to address the challenges of bhyve VM reinitialization have shifted a bit while testing proposed fixes for this issue. The logic for marking vCPUs suspended as soon as possible (for those not running in guest context) seemed to work well. The wait flag for suspend, however, did not seem as valuable. A simpler approach was to make the
VM_REINIT ioctl itself capable of marking straggler vCPUs as suspended. Since
VM_REINIT acquires a write-lock on the VM, which involves locking all the vCPUs in the
FROZEN state, this after-the-fact marking of suspended is safe and easy. Adding a flags field to the
VM_REINIT ioctl was the more natural fit.
Updated by Patrick Mooney 8 months ago
I ran all of the usual-suspects guests on a platform featuring this change. Each one successfully booted, and after initiating a reboot from inside the guest, successfully booted again after a reset. The same held true for an externally triggered "hard" reset using
bhyvectl to issue the
vm_suspend call. Additionally, a propolis instance (running a linux guest) went through those same boot/reboot and boot/hard-reset trials, navigating them successfully.
Updated by Patrick Mooney 8 months ago
bhyve(1) does not use the forced-reinit, and propolis is configured to quiesce its vCPU threads prior to a reinit (which happens to use the FORCE flag), I also wanted to test behavior of a userspace consumer which is running vCPU threads when the forced-reinit occurs. To do this, I ran a normal bhyve instance (sans viona, since the ioport hooks preclude reinit) and issued a manual
ioctl(VM_REINIT) with the FORCE flag set. All of the vCPU threads in that instance were promptly booted out to userspace, hitting the (expected) EINVAL when attempting re-entry, since the now-reinitialized VM has no
VM_ACTIVATE -ed vCPUs.
Updated by Electric Monk 8 months ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
commit 52fac30e3e977464254b44b1dfb4717fb8d2fbde Author: Patrick Mooney <firstname.lastname@example.org> Date: 2021-09-28T16:17:31.000Z 14024 bhyve vm_suspend should be more flexible Reviewed by: Dan Cross <email@example.com> Reviewed by: Luqman Aden <firstname.lastname@example.org> Reviewed by: Joshua M. Clulow <email@example.com> Approved by: Dan McDonald <firstname.lastname@example.org>