Project

General

Profile

Bug #13325

bhyve misinterprets MSI redir hint

Added by Patrick Mooney 5 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Category:
bhyve
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
bhyve
Gerrit CR:

Description

While developing an alternative bhyve userspace which uses a standard OVMF ROM to boot its guest, I observed (alpine) Linux failing to react to message signaled interrupt. This was from an MSI-X enabled virtio block device that the system was booting from. The device in question had been configured to send an MSI to address fee01004. That decodes to fixed-type, logical-mode, without a redirection hint, sent to a destination bitmask of 0b00000001 (CPU 0, essentially). I had initially misinterpreted this as a physical-mode interrupt sent to the wrong APIC ID. After a lot of digging around, I found that Linux defaults to a flat (rather than clustered) logical mode behavior for systems with <= 8 CPUs. The c-bhyve runtime was not seeing this behavior because the bhyve-specific UEFI ROM has FORCE_APIC_PHYSICAL_DESTINATION_MODE set in the flags section of its provided FADT ACPI data. This lead me back to lapic_intr_msi to see why the logical-flat addressing was not causing the interrupt to be properly delivered.

Its logic for determine the mode (logical or physical) of the MSI was immediately suspect:

        /*
         * Extract the x86-specific fields from the MSI addr/msg
         * params according to the Intel Arch spec, Vol3 Ch 10.
         *
         * The PCI specification does not support level triggered
         * MSI/MSI-X so ignore trigger level in 'msg'.
         *
         * The 'dest' is interpreted as a logical APIC ID if both
         * the Redirection Hint and Destination Mode are '1' and
         * physical otherwise.
         */
        dest = (addr >> 12) & 0xff;
        phys = ((addr & (MSI_X86_ADDR_RH | MSI_X86_ADDR_LOG)) !=
            (MSI_X86_ADDR_RH | MSI_X86_ADDR_LOG));
        delmode = msg & APIC_DELMODE_MASK;
        vec = msg & 0xff;

Even the comment mentions that logical mode is "enabled" only if the redirection hint is set. The Intel manuals are not exactly straightforward on the subject. They describe some restrictions on broadcast-targeted interrupts when the redirection-hint bit is set. Their description of behavior when the hint bit is absent is a little confusing though:

If RH is 0, then the DM bit is ignored and the message is sent ahead independent of whether the physical or logical destination mode is used.

That suggests the delivery-mode (logical vs physical) is ignored without the redirect hint, but Linux is clearly counting on logical delivery without the RH bit. Some commentary I found on LKML added useful

If RH is 0, then the DM bit is ignored and the message is sent ahead
independent of whether the physical or logical destination mode is
used.

The PCI device simply writes the message data to that address, it does not
even know what the individual bits mean. It's a write of data to address.

The write gets then directed to the APIC bus or the Processor System Bus
depending on the CPU by a translation unit. The translated message which
goes on the bus to which the APIC are connected contains the DM bit
which is always evaluated by the local APICs for matching.

While there is probably more logic we can attach to the redirect-hint bit, we should probably decouple it from the physical/logical mode determination for now.

#1

Updated by Patrick Mooney 5 months ago

I fired up a test platform which disregards the redir-hint bit for MSIs, choosing the delivery mode solely on that bit in the address. With the change in place, logical-mode MSIs were delivered as expected from the virtio device.

#2

Updated by Patrick Mooney 5 months ago

  • Tags changed from bh to bhyve
#3

Updated by Patrick Mooney 5 months ago

  • Description updated (diff)
#4

Updated by Electric Monk 5 months ago

  • Gerrit CR set to 1070
#5

Updated by Patrick Mooney 5 months ago

  • Description updated (diff)
#6

Updated by Patrick Mooney 4 months ago

The primary test case for this was an alternate bhyve userspace utilizing the OVMF bootrom. That ROM, unlike the bhyve-specific ones, does not set the ACPI flag which requires the subsequently booted OS to use physical delivery for its APIC operations. Due to that lack of flag, Linux will use logical delivery for MSIs on systems with <= 8 CPUs. For some reason, it also sets the redirect hint on MSI-X entries, resulting in the errant behavior (without the patch). With the fix in place, interrupts are properly delivered (using logical destination mode).

To confirm that c-bhyve was not adversely effected by the change, I booted the normal suite of guests for smoke-testing. To no surprise, they all functioned normally. (It's clear they weren't using logical destinations and redirect hints, since they would have been similarly sad prior to the fix.)

#7

Updated by Electric Monk 4 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit fe17aa88307d9cacf6677bbbe955585b11920199

commit  fe17aa88307d9cacf6677bbbe955585b11920199
Author: Patrick Mooney <pmooney@pfmooney.com>
Date:   2020-12-11T02:34:54.000Z

    13325 bhyve misinterprets MSI redir hint
    Reviewed by: Robert Mustacchi <rm@fingolfin.org>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF