Bug #14281
closedbhyve bungles kstat clean-up
100%
Description
While playing with the bhyve kstats exposed by #13723, I hit a panic:
panic[cpu15]/thread=fffffe59eb8027e0: BAD TRAP: type=e (#pf Page fault) rp=fffffe007ae3d910 addr=fffffe8bf48b83d0 kstat: #pf Page fault Bad kernel fault at addr=0xfffffe8bf48b83d0 pid=106800, pc=0xfffffffff7dc85bb, sp=0xfffffe007ae3da00, eflags=0x10286 cr0: 80050033<pg,wp,ne,et,mp,pe> cr4: 3406f8<smap,smep,osxsav,xmme,fxsr,pge,mce,pae,pse,de> cr2: fffffe8bf48b83d0 cr3: 841007000 cr8: 0 rdi: fffffe8bf4898000 rsi: 0 rdx: 200000 rcx: 18c r8: fffffe007ae3deb0 r9: 0 rax: fffffe8bf48b8000 rbx: fffffe8bf48b8000 rbp: fffffe007ae3da30 r10: fffffffffb88512c r11: 1 r12: fffffe8bf4898110 r13: 0 r14: 8bfe860 r15: fffffe8bf4898000 fsb: 0 gsb: fffffe59d3e75000 ds: 4b es: 4b fs: 0 gs: 1c3 trp: e err: 0 rip: fffffffff7dc85bb cs: 30 rfl: 10286 rsp: fffffe007ae3da00 ss: 38 fffffe007ae3d810 unix:rewrite_syscall+0 () fffffe007ae3d900 unix:trap+12b7 () fffffe007ae3d910 unix:_cmntrap+1cd () fffffe007ae3da30 vmm:vmm_kstat_update_vcpu+6b () fffffe007ae3dc30 kstat:read_kstat_data+f5 () fffffe007ae3dc70 kstat:kstat_ioctl+5b () fffffe007ae3dcb0 genunix:cdev_ioctl+2b () fffffe007ae3dd00 specfs:spec_ioctl+45 () fffffe007ae3dd90 genunix:fop_ioctl+5b () fffffe007ae3deb0 genunix:ioctl+153 () fffffe007ae3df00 unix:brand_sys_syscall32+290 () > $C fffffe007ae3da30 vmm_kstat_update_vcpu+0x6b(fffffe8bf4898000, 0) fffffe007ae3dc30 read_kstat_data+0xf5(fffffe007ae3ddc8, 8bfe860, 100001) fffffe007ae3dc70 kstat_ioctl+0x5b(7d00000000, 4b02, 8bfe860, 100001, fffffe59f7ecc2a8, fffffe007ae3ddc8) fffffe007ae3dcb0 cdev_ioctl+0x2b(7d00000000, 4b02, 8bfe860, 100001, fffffe59f7ecc2a8, fffffe007ae3ddc8) fffffe007ae3dd00 spec_ioctl+0x45(fffffe59e3abf800, 4b02, 8bfe860, 100001, fffffe59f7ecc2a8, fffffe007ae3ddc8, 0) fffffe007ae3dd90 fop_ioctl+0x5b(fffffe59e3abf800, 4b02, 8bfe860, 100001, fffffe59f7ecc2a8, fffffe007ae3ddc8, 0) fffffe007ae3deb0 ioctl+0x153(3, 4b02, 8bfe860) fffffe007ae3df00 sys_syscall32+0x20f() > fffffe8bf4898000::print kstat_t { ks_crtime = 0x35943c61d2a83 ks_next = 0 ks_kid = 0x1d8d ks_module = [ "vmm" ] ks_resv = 0 ks_instance = 0x1d ks_name = [ "vcpu0" ] ks_type = 0x1 ks_class = [ "misc" ] ks_flags = 0 ks_data = 0xfffffe8bf4898110 ks_ndata = 0x7 ks_data_size = 0x150 ks_snaptime = 0x35943c61d2a83 ks_update = vmm_kstat_update_vcpu ks_private = 0xfffffe8bf48b8000 ks_snapshot = default_kstat_snapshot ks_lock = 0 } > fffffe8bf4898000::print kstat_t ks_private | ::whatis fffffe8bf48b8000 is fffffe8bf48b1000+7000, freed from the heap vmem arena: ADDR TYPE START END SIZE THREAD TIMESTAMP fffffe5e5ba018c8 FREE fffffe8bf48b1000 fffffe8bf53a7000 11493376
The bhyve kstats stash a reference to the struct vm
in ks_private
, so trying to access that after it's freed is not great. The kstat update is pretty straightforward in that regard:
int
vmm_kstat_update_vcpu(struct kstat *ksp, int rw)
{
struct vm *vm = ksp->ks_private;
vmm_vcpu_kstats_t *vvk = ksp->ks_data;
const int vcpuid = vvk->vvk_vcpu.value.ui32;
struct vcpu *vcpu = &vm->vcpu[vcpuid];
ASSERT3U(vcpuid, <, VM_MAXCPU);
vvk->vvk_time_init.value.ui64 = vcpu->ustate_total[VU_INIT];
vvk->vvk_time_run.value.ui64 = vcpu->ustate_total[VU_RUN];
vvk->vvk_time_idle.value.ui64 = vcpu->ustate_total[VU_IDLE];
vvk->vvk_time_emu_kern.value.ui64 = vcpu->ustate_total[VU_EMU_KERN];
vvk->vvk_time_emu_user.value.ui64 = vcpu->ustate_total[VU_EMU_USER];
vvk->vvk_time_sched.value.ui64 = vcpu->ustate_total[VU_SCHED];
return (0);
}
Why would that be freed? The VM in question was undergoing recreation, having been just destroyed and then subsequently created. If we look at the logic for VM destruction, it becomes clearer:
static int
vmm_do_vm_destroy_locked(vmm_softc_t *sc, boolean_t clean_zsd,
boolean_t *hma_release)
{
...
if (sc->vmm_is_open) {
list_insert_tail(&vmm_destroy_list, sc);
sc->vmm_flags |= VMM_DESTROY;
} else {
vm_destroy(sc->vmm_vm);
vmm_kstat_fini(sc);
ddi_soft_state_free(vmm_statep, minor);
id_free(vmm_minors, minor);
*hma_release = B_TRUE;
}
...
}
static int
vmm_close(dev_t dev, int flag, int otyp, cred_t *credp)
{
...
/*
* If this VM was destroyed while the vmm device was open, then
* clean it up now that it is closed.
*/
if (sc->vmm_flags & VMM_DESTROY) {
list_remove(&vmm_destroy_list, sc);
vm_destroy(sc->vmm_vm);
ddi_soft_state_free(vmm_statep, minor);
id_free(vmm_minors, minor);
hma_release = B_TRUE;
}
...
}
So it appears we have two problems here:
1. The vmm_close
after destroy case (where actual VM destruction is deferred until the device itself is closed) completely misses its needed vmm_kstat_fini
call, leaving kstats lingering around with a reference to the now-destroyed VM.
2. The vmm_do_vm_destroy_locked
case destroys the VM before cleaning up the associated kstats, leaving a short window for use-after-free if those stats are accessed during such a window.
Both of these should be addressed to make the bhyve kstats situation more correct.
Related issues
Updated by Patrick Mooney about 2 years ago
- Related to Bug #13723: expose bhyve metrics as kstats added
Updated by Patrick Mooney about 2 years ago
For testing, I constructed a contrived example for case 1 (deferred vm_destroy leaving lingering kstats). On an unpatched machine, I was able to observe those kstats as "present" even after the deferred destruction of the vm completed (seen by tracing the call via dtrace, and then checking for an empty /dev/vmm
). When running the proposed patch, there were no lingering kstats after the same set of actions.
The second case (the use-after-free race) was not observed causing problems in the wild. It was checked by following the vmm_kstat_fini
and vm_destroy
calls via dtrace to confirm their proper ordering to avoid the potential issue.
In addition to those specific tests, the machine running the test bits was forced to dump, so it could be checked with ::findleaks
. It was clean, modulo the "expected" entries.
Updated by Electric Monk about 2 years ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
git commit 6884664d1dc95e1917a534df2163ccd6f5c19b69
commit 6884664d1dc95e1917a534df2163ccd6f5c19b69 Author: Patrick Mooney <pmooney@pfmooney.com> Date: 2021-12-02T23:08:22.000Z 14281 bhyve bungles kstat clean-up Reviewed by: Rich Lowe <richlowe@richlowe.net> Reviewed by: Andy Fiddaman <andy@omnios.org> Approved by: Dan McDonald <danmcd@joyent.com>