bhyve misinterprets MSI redir hint
While developing an alternative bhyve userspace which uses a standard OVMF ROM to boot its guest, I observed (alpine) Linux failing to react to message signaled interrupt. This was from an MSI-X enabled virtio block device that the system was booting from. The device in question had been configured to send an MSI to address
fee01004. That decodes to fixed-type, logical-mode, without a redirection hint, sent to a destination bitmask of
0b00000001 (CPU 0, essentially). I had initially misinterpreted this as a physical-mode interrupt sent to the wrong APIC ID. After a lot of digging around, I found that Linux defaults to a flat (rather than clustered) logical mode behavior for systems with <= 8 CPUs. The c-bhyve runtime was not seeing this behavior because the bhyve-specific UEFI ROM has
FORCE_APIC_PHYSICAL_DESTINATION_MODE set in the flags section of its provided
FADT ACPI data. This lead me back to
lapic_intr_msi to see why the logical-flat addressing was not causing the interrupt to be properly delivered.
Its logic for determine the mode (logical or physical) of the MSI was immediately suspect:
/* * Extract the x86-specific fields from the MSI addr/msg * params according to the Intel Arch spec, Vol3 Ch 10. * * The PCI specification does not support level triggered * MSI/MSI-X so ignore trigger level in 'msg'. * * The 'dest' is interpreted as a logical APIC ID if both * the Redirection Hint and Destination Mode are '1' and * physical otherwise. */ dest = (addr >> 12) & 0xff; phys = ((addr & (MSI_X86_ADDR_RH | MSI_X86_ADDR_LOG)) != (MSI_X86_ADDR_RH | MSI_X86_ADDR_LOG)); delmode = msg & APIC_DELMODE_MASK; vec = msg & 0xff;
Even the comment mentions that logical mode is "enabled" only if the redirection hint is set. The Intel manuals are not exactly straightforward on the subject. They describe some restrictions on broadcast-targeted interrupts when the redirection-hint bit is set. Their description of behavior when the hint bit is absent is a little confusing though:
If RH is 0, then the DM bit is ignored and the message is sent ahead independent of whether the physical or logical destination mode is used.
That suggests the delivery-mode (logical vs physical) is ignored without the redirect hint, but Linux is clearly counting on logical delivery without the RH bit. Some commentary I found on LKML added useful
If RH is 0, then the DM bit is ignored and the message is sent ahead
independent of whether the physical or logical destination mode is
The PCI device simply writes the message data to that address, it does not
even know what the individual bits mean. It's a write of data to address.
The write gets then directed to the APIC bus or the Processor System Bus
depending on the CPU by a translation unit. The translated message which
goes on the bus to which the APIC are connected contains the DM bit
which is always evaluated by the local APICs for matching.
While there is probably more logic we can attach to the redirect-hint bit, we should probably decouple it from the physical/logical mode determination for now.