Bug #13309
bhyve movs emulation leaks mem refcnt
100%
Description
Several users reported via IRC that they were seeing bhyve leak guest-sized chunks of memory when rebooting win2019 guests. Checking vmm_list
and vmm_destroy_list
it did not appear to be a case where deferred clean-up/destroy was to blame. One of the users was available to run a test for me, so they recorded the struct vm
state before rebooting the instance (which did seem to leak the guest memory) before initiating a panic to capture a dump. Looking at the vm_object@s which were associated with the now-destroy instance, I could see that their ref-counts were still high. This prevented the memory backing them from being freed. I fired up a win2019 instance locally and found that it leaked memory in the same way. Armed with this information, I traced all @vm_object_reference
and vm_object_deallocate
calls (which is where those refcounts are modified) to see where we might be missing a de-allocate. This lead me to vie_emulate_movs
:
error = vm_copy_setup(vm, vcpuid, paging, srcaddr, opsize, PROT_READ,
copyinfo, nitems(copyinfo), &fault);
if (error == 0) {
if (fault)
goto done; /* Resume guest to handle fault */
/*
* case (2): read from system memory and write to mmio.
*/
vm_copyin(vm, vcpuid, copyinfo, &val, opsize);
vm_copy_teardown(vm, vcpuid, copyinfo, nitems(copyinfo));
error = vie_mmio_write(vie, vm, vcpuid, gpa, val, opsize);
if (error)
goto done;
} else {
/*
* 'vm_copy_setup()' is expected to fail for cases (3) and (4)
* if 'srcaddr' is in the mmio space.
*/
if (vie_get_gla(vie, vm, vcpuid, opsize, vie->addrsize,
PROT_WRITE, VM_REG_GUEST_ES, VM_REG_GUEST_RDI,
&dstaddr) != 0) {
goto done;
}
error = vm_copy_setup(vm, vcpuid, paging, dstaddr, opsize,
PROT_WRITE, copyinfo, nitems(copyinfo), &fault);
if (error == 0) {
if (fault)
goto done; /* Resume guest to handle fault */
/*
* case (3): read from MMIO and write to system memory.
*
* A MMIO read can have side-effects so we
* commit to it only after vm_copy_setup() is
* successful. If a page-fault needs to be
* injected into the guest then it will happen
* before the MMIO read is attempted.
*/
error = vie_mmio_read(vie, vm, vcpuid, gpa, &val,
opsize);
if (error)
goto done;
vm_copyout(vm, vcpuid, &val, copyinfo, opsize);
vm_copy_teardown(vm, vcpuid, copyinfo,
nitems(copyinfo));
The "case 3" path is of specific interest. Here we're doing an MMIO from an emulated device and storing the result in guest memory. The copy-setup (which establishes a hold on the memory) occurs before the MMIO is attempted. If that MMIO address is not handled in-kernel, the vie_mmio_read
call will fail with an error (indicating that fulfillment needs to occur in userspace). It's there that we're leaking the vm_object
reference, since the goto done
skips the vm_copy_teardown
call if the read needs to be processed in userspace.
As a precaution, I looked at the other vm_copy_teardown
calls in the instruction emulation. All of them (save for the one in question) had proper flow control to ensure they were called to conclude the emulation.