Project

General

Profile

Actions

Bug #14603

closed

bhyve passthrough mishandles combined CMD/STATUS reads

Added by Andy Fiddaman 5 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
bhyve
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

An OmniOS user reported that booting a bhyve guest with an XHCI controller passed-through to it failed to boot with the updated uefi-edk2-stable-202111 bootrom, printing errors and dropping into a loop:

XhcReadCapReg: Pci Io read error - Invalid Parameter at 0
XhcReadCapReg: Pci Io read error - Invalid Parameter at 4
XhcReadCapReg: Pci Io read error - Invalid Parameter at 8
XhcReadCapReg: Pci Io read error - Invalid Parameter at 16
XhcReadCapReg: Pci Io read error - Invalid Parameter at 20
XhcReadCapReg: Pci Io read error - Invalid Parameter at 24
XhcReadOpReg: Pci Io Read error - Unsupported at 8
XhcReadExtCapReg: Pci Io Read error - Unsupported at 0
XhcReadExtCapReg: Pci Io Read error - Unsupported at 1020
XhcReadExtCapReg: Pci Io Read error - Unsupported at 2040
XhcReadExtCapReg: Pci Io Read error - Unsupported at 3060
XhcReadExtCapReg: Pci Io Read error - Unsupported at 4080
...

The user bisected the bootrom between the working stable-202102 version and this one and found that the problem was introduced with:

commit 1e0c441c922d92dae736f1b47df05373490045ae
Date:   Mon Jul 5 13:08:41 2021 +0200
     OvmfPkg/Bhyve: add USB support

So this is not necessarily a bug in the bootrom, just something exposed by adding the USB drivers to the UEFI environment.

Adding some debug lines and watching the ROM's DEBUG output shows this:


Bhyve's pci_emul says:

    Assigned BAR[1] Mem32 c0000000 + 2000
    Assigned BAR[0]    IO 2000 + 20
    Assigned BAR[0] Mem64 c0010000 + 10000        <-- XHCI device
    pci.0.31.7:
           IO=2000 - 2040
        MEM32=c0000000 - c0200000
        MEM64=800000000 - 820000000
    (bhyve deliberately adds slop to the end to give the guest flexibility)

Bootrom says:

    RootBridge: PciRoot(0x0)
      Support/Attr: 10003 / 10003
        DmaAbove4G: No
    NoExtConfSpace: Yes
         AllocAttr: 0 ()
               Bus: 0 - 0 Translation=0
                Io: 2000 - 201F Translation=0
               Mem: C0000000 - C0001FFF Translation=0
        MemAbove4G: FFFFFFFFFFFFFFFF - 0 Translation=0
              PMem: FFFFFFFFFFFFFFFF - 0 Translation=0
       PMemAbove4G: FFFFFFFFFFFFFFFF - 0 Translation=0

    PciBus: Discovered PCI @ [00|00|00]

    PciBus: Discovered PCI @ [00|01|00]

    PciBus: Discovered PCI @ [00|07|00]
       BAR[0]: Type = Io16; Alignment = 0x1F; Length = 0x20; Offset = 0x10
            Baseaddress = 0x2000
       BAR[1]: Type = Mem32; Alignment = 0x1FFF; Length = 0x2000; Offset = 0x14
            Baseaddress = 0xC0000000

    PciBus: Discovered PCI @ [00|09|00]
       BAR[0]: Type =  Mem64; Alignment = 0xFFFF; Length = 0x10000;
            Offset = 0x10; Baseaddress = 0xC0010000

The fundamental problem here is that the memory range assigned to the PCI Root Bridge does not encompass the memory assigned to the XHCI device
causing accesses to the XHCI BAR to fail boundary checks.

The memory range is constructed, in the absence of generated and loaded ACPI tables, by scanning the BARs of the enumerated devices.
Adding some more debug statements there:

ParseBars(0/0/0) - command 4
ParseBars(0/1/0) - command 4
ParseBars(0/7/0) - command 7
IO
   --  2000 - 201F
MEM
   --  C0010000 - C0011FFF
ParseBars(0/9/0) - command 290

That's an odd command register value...

# /usr/lib/pci/pcieadm show-cfgspace -d ppt1 header0.command header0.status
Device ppt1 -- Type 0 Header
  Command: 0x6
  Status: 0x290

it just happens to match the status register.

When a guest does a combined read on the command and status register for a passthrough device, the returned value is in the wrong order with the command and status registers swapped.
This was fixed upstream in FreeBSD as part of https://github.com/freebsd/freebsd-src/commit/2136849868a0875eade90a4ce8bfd5df054a74bb
but the specific fix was missed during the May 2020 bhyve sync. I've checked and the rest of the commit was pulled into illumos.

Actions #1

Updated by Electric Monk 5 months ago

  • Gerrit CR set to 2087
Actions #2

Updated by Andy Fiddaman 3 months ago

This has been in OmniOS bloody since March and tested with igb network cards and xhci devices passed through to guests and booted with a UEFI bootrom that has xhci drivers. The original bug reporter has also tested this and confirmed that the problem is resolved.

Actions #3

Updated by Electric Monk 3 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit b518543be8042a5a0dda9b983f71c4a99d74ad99

commit  b518543be8042a5a0dda9b983f71c4a99d74ad99
Author: Andy Fiddaman <omnios@citrus-it.co.uk>
Date:   2022-05-12T21:22:00.000Z

    14603 bhyve passthrough mishandles combined CMD/STATUS reads
    Reviewed by: Jorge Schrauwen <registration@blackdot.be>
    Reviewed by: Jason King <jason.brian.king+illumos@gmail.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Dan McDonald <danmcd@mnx.io>

Actions

Also available in: Atom PDF