Bug #13712
closedbhyve allows vmspace mapping inside existing range
100%
Description
During testing of the new bhyve bits in #13674, Jorge experienced some panics due to an assertion failure:
panic message: assertion failed: list_is_empty(&vms->vms_maplist), file: ../../i86pc/io/vmm/vmm_sol_vm.c, line: 128 > $C fffffe007b641a30 vpanic() fffffe007b641a80 0xfffffffffba434c5() fffffe007b641ab0 vmspace_free+0x36(fffffe5a6b7573d0) fffffe007b641ad0 ept_vmspace_free+0xe(fffffe5a6b7573d0) fffffe007b641b20 vm_cleanup+0x14f(fffffe5c74cbc000, 1) fffffe007b641b50 vm_destroy+0x1b(fffffe5c74cbc000) fffffe007b641bb0 vmm_do_vm_destroy_locked+0xae(fffffe5a60370080, 0, fffffe007b641bdc) fffffe007b641c00 vmm_do_vm_destroy+0x33(fffffe5a60370080, 0) fffffe007b641c30 vmm_zsd_destroy+0x52(15, fffffe6373cd0880) fffffe007b641ca0 zsd_apply_destroy+0x1ee(0, 1, fffffe5a5eecc580, 10) fffffe007b641cf0 zsd_apply_all_keys+0x4e(fffffffffbb168f0, fffffe5a5eecc580) fffffe007b641d40 zone_zsd_callbacks+0xc9(fffffe5a5eecc580, 2) fffffe007b641db0 zone_destroy+0x11b(15) fffffe007b641f00 zone+0x1cd(1, 15, 0, 30, 0) fffffe007b641f10 sys_syscall+0x1a8()
The leftover mapping in the list is:
> fffffe5a6b7573d0::print struct vmspace vms_maplist | ::walk list | ::print vmspace_mapping_t { vmsm_node = { list_next = 0xfffffe5a6b757448 list_prev = 0xfffffe5a6b757448 } vmsm_object = 0xfffffe5cc08a56b0 vmsm_addr = 0xffffe000 vmsm_len = 0x1000 vmsm_offset = 0 vmsm_prot = 0x3 }
Patrick tells me that this is an address in the middle of the (larger) bootrom range, which shouldn't have been allowed.
dtracing things shows that this range is being allocated by ppt, from here:
bhyve`passthru_msix_addr+0x150 bhyve`passthru_addr+0x95 bhyve`modify_bar_registration+0x170 bhyve`register_bar+0x10 bhyve`update_bar_address+0xdf bhyve`pci_cfgrw+0x28b
and is succeeding:
15 81183 ppt_map_mmio:entry ppt_map_mmio(fd=6, gpa=ffffe000, len=1000, hpa=e0900000) 15 82848 vm_map_mmio:entry vm_map_mmio(gpa=ffffe000, len=1000, hpa=e0900000) 15 82849 vm_map_mmio:return 0 15 81184 ppt_map_mmio:return 0
However, shortly afterwards, the memory is unmapped again. This fails but ppt thinks it has been successful since the error is not passed back. ppt removes the mapping from its local list and this causes it to be left dangling when the VM is later destroyed. You can see from the following trace that, when looking for the mapping, vm_mapping_find()
is returning the larger bootrom mapping that encompasses this BAR allocation. Since vm_map_remove()
requires an exact match here, it returns ENOENT
, but this error is not propagated back.
15 81161 ppt_unmap_mmio:entry ppt_unmap_mmio(fd=6, gpa=ffffe000, len=1000) 15 82844 vm_unmap_mmio:entry vm_unmap_mmio(gpa=ffffe000, len=1000) 15 81865 vm_map_remove:entry vm_map_remove(start=ffffe000, end=fffff000) 15 82771 vm_mapping_find:entry vm_mapping_find(addr=ffffe000, size=1000) 15 82772 vm_mapping_find:return vmspace_mapping_t { list_node_t vmsm_node = { struct list_node *list_next = 0xfffffe5a656425e8 struct list_node *list_prev = 0xfffffe58e557e880 } vm_object_t vmsm_object = 0xfffffe5a5d94b618 uintptr_t vmsm_addr = 0xfff00000 size_t vmsm_len = 0x100000 off_t vmsm_offset = 0xf00000 uint_t vmsm_prot = 0x5 } 15 81866 vm_map_remove:return 2 15 82845 vm_unmap_mmio:return 0 15 81162 ppt_unmap_mmio:return 0
There are two issues here. The first is that the initial BAR registration should have failed since the addresses are already mapped for the bootrom. The second is that the error from unmap (ENOENT) was not passed up.
The first problem appears to be in the vm_mapping_gap()
function that checks if a new address range overlaps with an existing one. It fails in the case that the new range is entirely contained within an existing one.
Related issues
Updated by Andy Fiddaman over 2 years ago
- Related to Bug #13713: vm_unmap_mmio() should return non zero on error added
Updated by Andy Fiddaman over 2 years ago
- Blocks Feature #13674: bhyve upstream sync 2021 March added
Updated by Andy Fiddaman over 2 years ago
Jorge has re-tested his patched image, incorporating the additional patch from the attached Gerrit change and confirmed that it no longer panics, and that the passthrough devices are still working.
dtrace shows that the BAR allocation is now failing (with ENOMEM), as expected:
8 81180 vm_map_find:entry vm_map_find(addr=ffffe000, len=1000) vmm`vmm_mmio_alloc+0xe2 vmm`vm_map_mmio+0x14 ppt`ppt_map_mmio+0x101 8 82269 vm_mapping_gap:entry vm_mapping_gap(addr=ffffe000, size=1000) 8 82270 vm_mapping_gap:return 0 8 81181 vm_map_find:return 12
The passed-through device has a successful BAR mapping at a different location (pciconf
in a FreeBSD guest):
xhci0@pci0:0:8:0: class=0x0c0330 card=0x00151912 chip=0x00151912 rev=0x02 hdr=0x00 vendor = 'Renesas Technology Corp.' device = 'uPD720202 USB 3.0 Host Controller' class = serial bus subclass = USB bar [10] = type Prefetchable Memory, range 64, base 0xc0004000, size 8192, enabled cap 01[50] = powerspec 3 supports D0 D3 current D0 cap 05[70] = MSI supports 2 messages, 64 bit enabled with 1 message cap 11[90] = MSI-X supports 8 messages Table in map 0x10[0x1000], PBA in map 0x10[0x1080] cap 10[a0] = PCI-Express 2 endpoint max data 128(128) link x1(x1) speed 5.0(5.0) ASPM disabled(L0s/L1) ClockPM disabled
and dtrace also shows that this allocation was successful:
6 81180 vm_map_find:entry vm_map_find(addr=c0004000, len=1000) 6 82269 vm_mapping_gap:entry vm_mapping_gap(addr=c0004000, size=1000) 6 82270 vm_mapping_gap:return 1 6 81181 vm_map_find:return 0
Updated by Jorge Schrauwen over 2 years ago
Booted a fresh PI with CR 1383, 1411, and 1412 applied
root@fermion:~ # pciconf -lvb hostb0@pci0:0:0:0: class=0x060000 card=0x00000000 chip=0x12378086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '440FX - 82441FX PMC [Natoma]' class = bridge subclass = HOST-PCI virtio_pci0@pci0:0:5:0: class=0x010000 card=0x00021af4 chip=0x10011af4 rev=0x00 hdr=0x00 vendor = 'Red Hat, Inc.' device = 'Virtio block device' class = mass storage subclass = SCSI bar [10] = type I/O Port, range 32, base 0x2000, size 128, enabled bar [14] = type Memory, range 32, base 0xc0000000, size 8192, enabled virtio_pci1@pci0:0:6:0: class=0x020000 card=0x00011af4 chip=0x10001af4 rev=0x00 hdr=0x00 vendor = 'Red Hat, Inc.' device = 'Virtio network device' class = network subclass = ethernet bar [10] = type I/O Port, range 32, base 0x2080, size 32, enabled bar [14] = type Memory, range 32, base 0xc0002000, size 8192, enabled xhci0@pci0:0:8:0: class=0x0c0330 card=0x00151912 chip=0x00151912 rev=0x02 hdr=0x00 vendor = 'Renesas Technology Corp.' device = 'uPD720202 USB 3.0 Host Controller' class = serial bus subclass = USB bar [10] = type Prefetchable Memory, range 64, base 0xc0004000, size 8192, enabled ixl0@pci0:0:8:1: class=0x020000 card=0x37d215d9 chip=0x37d28086 rev=0x09 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection X722 for 10GBASE-T' class = network subclass = ethernet bar [10] = type Prefetchable Memory, range 64, base 0xc1000000, size 16777216, enabled bar [1c] = type Prefetchable Memory, range 64, base 0xc2000000, size 32768, enabled isab0@pci0:0:31:0: class=0x060100 card=0x00000000 chip=0x70008086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82371SB PIIX3 ISA [Natoma/Triton II]' class = bridge subclass = PCI-ISA
One of the 4 seperate USB controller is passed to a Win 10 VM (NVMe based) and is working fine, a i40e NIC + 2nd USB controller is passed to a freebsd VM
Test USB dongle showed up and was functional in both, as was the i40e nic.
Did a quick check if NVMe, vioblk, and viona were also working as expected with was the case.
Updated by Andy Fiddaman over 2 years ago
This change has also been slightly less extensively tested applied directly to gate (that is, without the other commits that were in the branch tested by Jorge). Booted various VMs running OmniOS, SmartOS, Windows, FreeBSD including some with configured pass-through devices. All tests were successful.
Updated by Electric Monk over 2 years ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
git commit 9558d0b12b2242f8f19a3526ff0656c48b28f657
commit 9558d0b12b2242f8f19a3526ff0656c48b28f657 Author: Andy Fiddaman <omnios@citrus-it.co.uk> Date: 2021-04-19T09:09:17.000Z 13712 bhyve allows vmspace mapping inside existing range Reviewed by: Patrick Mooney <pmooney@pfmooney.com> Reviewed by: Jorge Schrauwen <sjorge@blackdot.be> Approved by: Dan McDonald <danmcd@joyent.com>