bhyve misinterprets MSI redir hint
While developing an alternative bhyve userspace which uses a standard OVMF ROM to boot its guest, I observed (alpine) Linux failing to react to message signaled interrupt. This was from an MSI-X enabled virtio block device that the system was booting from. The device in question had been configured to send an MSI to address
fee01004. That decodes to fixed-type, logical-mode, without a redirection hint, sent to a destination bitmask of
0b00000001 (CPU 0, essentially). I had initially misinterpreted this as a physical-mode interrupt sent to the wrong APIC ID. After a lot of digging around, I found that Linux defaults to a flat (rather than clustered) logical mode behavior for systems with <= 8 CPUs. The c-bhyve runtime was not seeing this behavior because the bhyve-specific UEFI ROM has
FORCE_APIC_PHYSICAL_DESTINATION_MODE set in the flags section of its provided
FADT ACPI data. This lead me back to
lapic_intr_msi to see why the logical-flat addressing was not causing the interrupt to be properly delivered.
Its logic for determine the mode (logical or physical) of the MSI was immediately suspect:
/* * Extract the x86-specific fields from the MSI addr/msg * params according to the Intel Arch spec, Vol3 Ch 10. * * The PCI specification does not support level triggered * MSI/MSI-X so ignore trigger level in 'msg'. * * The 'dest' is interpreted as a logical APIC ID if both * the Redirection Hint and Destination Mode are '1' and * physical otherwise. */ dest = (addr >> 12) & 0xff; phys = ((addr & (MSI_X86_ADDR_RH | MSI_X86_ADDR_LOG)) != (MSI_X86_ADDR_RH | MSI_X86_ADDR_LOG)); delmode = msg & APIC_DELMODE_MASK; vec = msg & 0xff;
Even the comment mentions that logical mode is "enabled" only if the redirection hint is set. The Intel manuals are not exactly straightforward on the subject. They describe some restrictions on broadcast-targeted interrupts when the redirection-hint bit is set. Their description of behavior when the hint bit is absent is a little confusing though:
If RH is 0, then the DM bit is ignored and the message is sent ahead independent of whether the physical or logical destination mode is used.
That suggests the delivery-mode (logical vs physical) is ignored without the redirect hint, but Linux is clearly counting on logical delivery without the RH bit. Some commentary I found on LKML added useful
If RH is 0, then the DM bit is ignored and the message is sent ahead
independent of whether the physical or logical destination mode is
The PCI device simply writes the message data to that address, it does not
even know what the individual bits mean. It's a write of data to address.
The write gets then directed to the APIC bus or the Processor System Bus
depending on the CPU by a translation unit. The translated message which
goes on the bus to which the APIC are connected contains the DM bit
which is always evaluated by the local APICs for matching.
While there is probably more logic we can attach to the redirect-hint bit, we should probably decouple it from the physical/logical mode determination for now.
Updated by Patrick Mooney 11 months ago
The primary test case for this was an alternate bhyve userspace utilizing the OVMF bootrom. That ROM, unlike the bhyve-specific ones, does not set the ACPI flag which requires the subsequently booted OS to use physical delivery for its APIC operations. Due to that lack of flag, Linux will use logical delivery for MSIs on systems with <= 8 CPUs. For some reason, it also sets the redirect hint on MSI-X entries, resulting in the errant behavior (without the patch). With the fix in place, interrupts are properly delivered (using logical destination mode).
To confirm that c-bhyve was not adversely effected by the change, I booted the normal suite of guests for smoke-testing. To no surprise, they all functioned normally. (It's clear they weren't using logical destinations and redirect hints, since they would have been similarly sad prior to the fix.)
Updated by Electric Monk 11 months ago
- Status changed from New to Closed
- % Done changed from 0 to 100
commit fe17aa88307d9cacf6677bbbe955585b11920199 Author: Patrick Mooney <firstname.lastname@example.org> Date: 2020-12-11T02:34:54.000Z 13325 bhyve misinterprets MSI redir hint Reviewed by: Robert Mustacchi <email@example.com> Approved by: Dan McDonald <firstname.lastname@example.org>