Project

General

Profile

Actions

Bug #13712

closed

bhyve allows vmspace mapping inside existing range

Added by Andy Fiddaman about 1 month ago. Updated 26 days ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
bhyve
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

During testing of the new bhyve bits in #13674, Jorge experienced some panics due to an assertion failure:

panic message: assertion failed: list_is_empty(&vms->vms_maplist), file: ../../i86pc/io/vmm/vmm_sol_vm.c, line: 128

> $C
fffffe007b641a30 vpanic()
fffffe007b641a80 0xfffffffffba434c5()
fffffe007b641ab0 vmspace_free+0x36(fffffe5a6b7573d0)
fffffe007b641ad0 ept_vmspace_free+0xe(fffffe5a6b7573d0)
fffffe007b641b20 vm_cleanup+0x14f(fffffe5c74cbc000, 1)
fffffe007b641b50 vm_destroy+0x1b(fffffe5c74cbc000)
fffffe007b641bb0 vmm_do_vm_destroy_locked+0xae(fffffe5a60370080, 0, fffffe007b641bdc)
fffffe007b641c00 vmm_do_vm_destroy+0x33(fffffe5a60370080, 0)
fffffe007b641c30 vmm_zsd_destroy+0x52(15, fffffe6373cd0880)
fffffe007b641ca0 zsd_apply_destroy+0x1ee(0, 1, fffffe5a5eecc580, 10)
fffffe007b641cf0 zsd_apply_all_keys+0x4e(fffffffffbb168f0, fffffe5a5eecc580)
fffffe007b641d40 zone_zsd_callbacks+0xc9(fffffe5a5eecc580, 2)
fffffe007b641db0 zone_destroy+0x11b(15)
fffffe007b641f00 zone+0x1cd(1, 15, 0, 30, 0)
fffffe007b641f10 sys_syscall+0x1a8()

The leftover mapping in the list is:

> fffffe5a6b7573d0::print struct vmspace vms_maplist | ::walk list | ::print vmspace_mapping_t
{
    vmsm_node = {
        list_next = 0xfffffe5a6b757448
        list_prev = 0xfffffe5a6b757448
    }
    vmsm_object = 0xfffffe5cc08a56b0
    vmsm_addr = 0xffffe000
    vmsm_len = 0x1000
    vmsm_offset = 0
    vmsm_prot = 0x3
}

Patrick tells me that this is an address in the middle of the (larger) bootrom range, which shouldn't have been allowed.

dtracing things shows that this range is being allocated by ppt, from here:

              bhyve`passthru_msix_addr+0x150
              bhyve`passthru_addr+0x95
              bhyve`modify_bar_registration+0x170
              bhyve`register_bar+0x10
              bhyve`update_bar_address+0xdf
              bhyve`pci_cfgrw+0x28b

and is succeeding:

 15  81183               ppt_map_mmio:entry ppt_map_mmio(fd=6, gpa=ffffe000, len=1000, hpa=e0900000)

 15  82848                vm_map_mmio:entry vm_map_mmio(gpa=ffffe000, len=1000, hpa=e0900000)

 15  82849               vm_map_mmio:return                 0
 15  81184              ppt_map_mmio:return                 0

However, shortly afterwards, the memory is unmapped again. This fails but ppt thinks it has been successful since the error is not passed back. ppt removes the mapping from its local list and this causes it to be left dangling when the VM is later destroyed. You can see from the following trace that, when looking for the mapping, vm_mapping_find() is returning the larger bootrom mapping that encompasses this BAR allocation. Since vm_map_remove() requires an exact match here, it returns ENOENT, but this error is not propagated back.

 
 15  81161             ppt_unmap_mmio:entry ppt_unmap_mmio(fd=6, gpa=ffffe000, len=1000)
 15  82844              vm_unmap_mmio:entry vm_unmap_mmio(gpa=ffffe000, len=1000)
 15  81865              vm_map_remove:entry vm_map_remove(start=ffffe000, end=fffff000)
 15  82771            vm_mapping_find:entry vm_mapping_find(addr=ffffe000, size=1000)

 15  82772           vm_mapping_find:return vmspace_mapping_t {
    list_node_t vmsm_node = {
        struct list_node *list_next = 0xfffffe5a656425e8
        struct list_node *list_prev = 0xfffffe58e557e880
    }
    vm_object_t vmsm_object = 0xfffffe5a5d94b618
    uintptr_t vmsm_addr = 0xfff00000
    size_t vmsm_len = 0x100000
    off_t vmsm_offset = 0xf00000
    uint_t vmsm_prot = 0x5
}
 15  81866             vm_map_remove:return                 2
 15  82845             vm_unmap_mmio:return                 0
 15  81162            ppt_unmap_mmio:return                 0

There are two issues here. The first is that the initial BAR registration should have failed since the addresses are already mapped for the bootrom. The second is that the error from unmap (ENOENT) was not passed up.

The first problem appears to be in the vm_mapping_gap() function that checks if a new address range overlaps with an existing one. It fails in the case that the new range is entirely contained within an existing one.


Related issues

Related to illumos gate - Bug #13713: vm_unmap_mmio() should return non zero on errorClosedAndy Fiddaman

Actions
Blocks illumos gate - Feature #13674: bhyve upstream sync 2021 MarchClosedAndy Fiddaman

Actions
Actions #1

Updated by Andy Fiddaman about 1 month ago

  • Related to Bug #13713: vm_unmap_mmio() should return non zero on error added
Actions #2

Updated by Andy Fiddaman about 1 month ago

Actions #3

Updated by Electric Monk about 1 month ago

  • Gerrit CR set to 1411
Actions #4

Updated by Andy Fiddaman about 1 month ago

Jorge has re-tested his patched image, incorporating the additional patch from the attached Gerrit change and confirmed that it no longer panics, and that the passthrough devices are still working.

dtrace shows that the BAR allocation is now failing (with ENOMEM), as expected:

  8  81180                vm_map_find:entry vm_map_find(addr=ffffe000, len=1000)

              vmm`vmm_mmio_alloc+0xe2
              vmm`vm_map_mmio+0x14
              ppt`ppt_map_mmio+0x101

  8  82269             vm_mapping_gap:entry vm_mapping_gap(addr=ffffe000, size=1000)

  8  82270            vm_mapping_gap:return                 0
  8  81181               vm_map_find:return                12

The passed-through device has a successful BAR mapping at a different location (pciconf in a FreeBSD guest):

xhci0@pci0:0:8:0:       class=0x0c0330 card=0x00151912 chip=0x00151912 rev=0x02 hdr=0x00
    vendor     = 'Renesas Technology Corp.'
    device     = 'uPD720202 USB 3.0 Host Controller'
    class      = serial bus
    subclass   = USB
    bar   [10] = type Prefetchable Memory, range 64, base 0xc0004000, size 8192, enabled
    cap 01[50] = powerspec 3  supports D0 D3  current D0
    cap 05[70] = MSI supports 2 messages, 64 bit enabled with 1 message
    cap 11[90] = MSI-X supports 8 messages
                 Table in map 0x10[0x1000], PBA in map 0x10[0x1080]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(128)
                 link x1(x1) speed 5.0(5.0) ASPM disabled(L0s/L1) ClockPM disabled

and dtrace also shows that this allocation was successful:

  6  81180                vm_map_find:entry vm_map_find(addr=c0004000, len=1000)
  6  82269             vm_mapping_gap:entry vm_mapping_gap(addr=c0004000, size=1000)
  6  82270            vm_mapping_gap:return                 1
  6  81181               vm_map_find:return                 0
Actions #5

Updated by Jorge Schrauwen about 1 month ago

Booted a fresh PI with CR 1383, 1411, and 1412 applied

root@fermion:~ # pciconf -lvb
hostb0@pci0:0:0:0:      class=0x060000 card=0x00000000 chip=0x12378086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '440FX - 82441FX PMC [Natoma]'
    class      = bridge
    subclass   = HOST-PCI
virtio_pci0@pci0:0:5:0: class=0x010000 card=0x00021af4 chip=0x10011af4 rev=0x00 hdr=0x00
    vendor     = 'Red Hat, Inc.'
    device     = 'Virtio block device'
    class      = mass storage
    subclass   = SCSI
    bar   [10] = type I/O Port, range 32, base 0x2000, size 128, enabled
    bar   [14] = type Memory, range 32, base 0xc0000000, size 8192, enabled
virtio_pci1@pci0:0:6:0: class=0x020000 card=0x00011af4 chip=0x10001af4 rev=0x00 hdr=0x00
    vendor     = 'Red Hat, Inc.'
    device     = 'Virtio network device'
    class      = network
    subclass   = ethernet
    bar   [10] = type I/O Port, range 32, base 0x2080, size 32, enabled
    bar   [14] = type Memory, range 32, base 0xc0002000, size 8192, enabled
xhci0@pci0:0:8:0:       class=0x0c0330 card=0x00151912 chip=0x00151912 rev=0x02 hdr=0x00
    vendor     = 'Renesas Technology Corp.'
    device     = 'uPD720202 USB 3.0 Host Controller'
    class      = serial bus
    subclass   = USB
    bar   [10] = type Prefetchable Memory, range 64, base 0xc0004000, size 8192, enabled
ixl0@pci0:0:8:1:        class=0x020000 card=0x37d215d9 chip=0x37d28086 rev=0x09 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection X722 for 10GBASE-T'
    class      = network
    subclass   = ethernet
    bar   [10] = type Prefetchable Memory, range 64, base 0xc1000000, size 16777216, enabled
    bar   [1c] = type Prefetchable Memory, range 64, base 0xc2000000, size 32768, enabled
isab0@pci0:0:31:0:      class=0x060100 card=0x00000000 chip=0x70008086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82371SB PIIX3 ISA [Natoma/Triton II]'
    class      = bridge
    subclass   = PCI-ISA

One of the 4 seperate USB controller is passed to a Win 10 VM (NVMe based) and is working fine, a i40e NIC + 2nd USB controller is passed to a freebsd VM

Test USB dongle showed up and was functional in both, as was the i40e nic.
Did a quick check if NVMe, vioblk, and viona were also working as expected with was the case.

Actions #6

Updated by Andy Fiddaman 29 days ago

This change has also been slightly less extensively tested applied directly to gate (that is, without the other commits that were in the branch tested by Jorge). Booted various VMs running OmniOS, SmartOS, Windows, FreeBSD including some with configured pass-through devices. All tests were successful.

Actions #7

Updated by Electric Monk 26 days ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 9558d0b12b2242f8f19a3526ff0656c48b28f657

commit  9558d0b12b2242f8f19a3526ff0656c48b28f657
Author: Andy Fiddaman <omnios@citrus-it.co.uk>
Date:   2021-04-19T09:09:17.000Z

    13712 bhyve allows vmspace mapping inside existing range
    Reviewed by: Patrick Mooney <pmooney@pfmooney.com>
    Reviewed by: Jorge Schrauwen <sjorge@blackdot.be>
    Approved by: Dan McDonald <danmcd@joyent.com>

Actions

Also available in: Atom PDF