Project

General

Profile

Bug #13325

bhyve misinterprets MSI redir hint

Added by Patrick Mooney 9 days ago. Updated 7 days ago.

Status:
New
Priority:
Normal
Category:
bhyve
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
bhyve
Gerrit CR:

Description

While developing an alternative bhyve userspace which uses a standard OVMF ROM to boot its guest, I observed (alpine) Linux failing to react to message signaled interrupt. This was from an MSI-X enabled virtio block device that the system was booting from. The device in question had been configured to send an MSI to address fee01004. That decodes to fixed-type, logical-mode, without a redirection hint, sent to a destination bitmask of 0b00000001 (CPU 0, essentially). I had initially misinterpreted this as a physical-mode interrupt sent to the wrong APIC ID. After a lot of digging around, I found that Linux defaults to a flat (rather than clustered) logical mode behavior for systems with <= 8 CPUs. The c-bhyve runtime was not seeing this behavior because the bhyve-specific UEFI ROM has FORCE_APIC_PHYSICAL_DESTINATION_MODE set in the flags section of its provided FADT ACPI data. This lead me back to lapic_intr_msi to see why the logical-flat addressing was not causing the interrupt to be properly delivered.

Its logic for determine the mode (logical or physical) of the MSI was immediately suspect:

        /*
         * Extract the x86-specific fields from the MSI addr/msg
         * params according to the Intel Arch spec, Vol3 Ch 10.
         *
         * The PCI specification does not support level triggered
         * MSI/MSI-X so ignore trigger level in 'msg'.
         *
         * The 'dest' is interpreted as a logical APIC ID if both
         * the Redirection Hint and Destination Mode are '1' and
         * physical otherwise.
         */
        dest = (addr >> 12) & 0xff;
        phys = ((addr & (MSI_X86_ADDR_RH | MSI_X86_ADDR_LOG)) !=
            (MSI_X86_ADDR_RH | MSI_X86_ADDR_LOG));
        delmode = msg & APIC_DELMODE_MASK;
        vec = msg & 0xff;

Even the comment mentions that logical mode is "enabled" only if the redirection hint is set. The Intel manuals are not exactly straightforward on the subject. They describe some restrictions on broadcast-targeted interrupts when the redirection-hint bit is set. Their description of behavior when the hint bit is absent is a little confusing though:

If RH is 0, then the DM bit is ignored and the message is sent ahead independent of whether the physical or logical destination mode is used.

That suggests the delivery-mode (logical vs physical) is ignored without the redirect hint, but Linux is clearly counting on logical delivery without the RH bit. Some commentary I found on LKML added useful

If RH is 0, then the DM bit is ignored and the message is sent ahead
independent of whether the physical or logical destination mode is
used.

The PCI device simply writes the message data to that address, it does not
even know what the individual bits mean. It's a write of data to address.

The write gets then directed to the APIC bus or the Processor System Bus
depending on the CPU by a translation unit. The translated message which
goes on the bus to which the APIC are connected contains the DM bit
which is always evaluated by the local APICs for matching.

While there is probably more logic we can attach to the redirect-hint bit, we should probably decouple it from the physical/logical mode determination for now.

#1

Updated by Patrick Mooney 9 days ago

I fired up a test platform which disregards the redir-hint bit for MSIs, choosing the delivery mode solely on that bit in the address. With the change in place, logical-mode MSIs were delivered as expected from the virtio device.

#2

Updated by Patrick Mooney 9 days ago

  • Tags changed from bh to bhyve
#3

Updated by Patrick Mooney 9 days ago

  • Description updated (diff)
#4

Updated by Electric Monk 9 days ago

  • Gerrit CR set to 1070
#5

Updated by Patrick Mooney 7 days ago

  • Description updated (diff)

Also available in: Atom PDF