Bug #13712
closedbhyve allows vmspace mapping inside existing range
100%
Description
During testing of the new bhyve bits in #13674, Jorge experienced some panics due to an assertion failure:
panic message: assertion failed: list_is_empty(&vms->vms_maplist), file: ../../i86pc/io/vmm/vmm_sol_vm.c, line: 128 > $C fffffe007b641a30 vpanic() fffffe007b641a80 0xfffffffffba434c5() fffffe007b641ab0 vmspace_free+0x36(fffffe5a6b7573d0) fffffe007b641ad0 ept_vmspace_free+0xe(fffffe5a6b7573d0) fffffe007b641b20 vm_cleanup+0x14f(fffffe5c74cbc000, 1) fffffe007b641b50 vm_destroy+0x1b(fffffe5c74cbc000) fffffe007b641bb0 vmm_do_vm_destroy_locked+0xae(fffffe5a60370080, 0, fffffe007b641bdc) fffffe007b641c00 vmm_do_vm_destroy+0x33(fffffe5a60370080, 0) fffffe007b641c30 vmm_zsd_destroy+0x52(15, fffffe6373cd0880) fffffe007b641ca0 zsd_apply_destroy+0x1ee(0, 1, fffffe5a5eecc580, 10) fffffe007b641cf0 zsd_apply_all_keys+0x4e(fffffffffbb168f0, fffffe5a5eecc580) fffffe007b641d40 zone_zsd_callbacks+0xc9(fffffe5a5eecc580, 2) fffffe007b641db0 zone_destroy+0x11b(15) fffffe007b641f00 zone+0x1cd(1, 15, 0, 30, 0) fffffe007b641f10 sys_syscall+0x1a8()
The leftover mapping in the list is:
> fffffe5a6b7573d0::print struct vmspace vms_maplist | ::walk list | ::print vmspace_mapping_t { vmsm_node = { list_next = 0xfffffe5a6b757448 list_prev = 0xfffffe5a6b757448 } vmsm_object = 0xfffffe5cc08a56b0 vmsm_addr = 0xffffe000 vmsm_len = 0x1000 vmsm_offset = 0 vmsm_prot = 0x3 }
Patrick tells me that this is an address in the middle of the (larger) bootrom range, which shouldn't have been allowed.
dtracing things shows that this range is being allocated by ppt, from here:
bhyve`passthru_msix_addr+0x150 bhyve`passthru_addr+0x95 bhyve`modify_bar_registration+0x170 bhyve`register_bar+0x10 bhyve`update_bar_address+0xdf bhyve`pci_cfgrw+0x28b
and is succeeding:
15 81183 ppt_map_mmio:entry ppt_map_mmio(fd=6, gpa=ffffe000, len=1000, hpa=e0900000) 15 82848 vm_map_mmio:entry vm_map_mmio(gpa=ffffe000, len=1000, hpa=e0900000) 15 82849 vm_map_mmio:return 0 15 81184 ppt_map_mmio:return 0
However, shortly afterwards, the memory is unmapped again. This fails but ppt thinks it has been successful since the error is not passed back. ppt removes the mapping from its local list and this causes it to be left dangling when the VM is later destroyed. You can see from the following trace that, when looking for the mapping, vm_mapping_find()
is returning the larger bootrom mapping that encompasses this BAR allocation. Since vm_map_remove()
requires an exact match here, it returns ENOENT
, but this error is not propagated back.
15 81161 ppt_unmap_mmio:entry ppt_unmap_mmio(fd=6, gpa=ffffe000, len=1000) 15 82844 vm_unmap_mmio:entry vm_unmap_mmio(gpa=ffffe000, len=1000) 15 81865 vm_map_remove:entry vm_map_remove(start=ffffe000, end=fffff000) 15 82771 vm_mapping_find:entry vm_mapping_find(addr=ffffe000, size=1000) 15 82772 vm_mapping_find:return vmspace_mapping_t { list_node_t vmsm_node = { struct list_node *list_next = 0xfffffe5a656425e8 struct list_node *list_prev = 0xfffffe58e557e880 } vm_object_t vmsm_object = 0xfffffe5a5d94b618 uintptr_t vmsm_addr = 0xfff00000 size_t vmsm_len = 0x100000 off_t vmsm_offset = 0xf00000 uint_t vmsm_prot = 0x5 } 15 81866 vm_map_remove:return 2 15 82845 vm_unmap_mmio:return 0 15 81162 ppt_unmap_mmio:return 0
There are two issues here. The first is that the initial BAR registration should have failed since the addresses are already mapped for the bootrom. The second is that the error from unmap (ENOENT) was not passed up.
The first problem appears to be in the vm_mapping_gap()
function that checks if a new address range overlaps with an existing one. It fails in the case that the new range is entirely contained within an existing one.
Related issues