Bug #13325

Updated by Patrick Mooney about 1 year ago

While developing an alternative bhyve userspace which uses a standard OVMF ROM to boot its guest, I observed (alpine) Linux failing to react to message signaled interrupt.    This was from an MSI-X enabled virtio block device that the system was booting from.    The device in question had been configured to send an MSI to address @fee01004@.    That decodes to fixed-type, logical-mode, without a redirection hint, sent to a destination bitmask of @0b00000001@ (CPU 0, essentially).    I had initially misinterpreted this as a physical-mode interrupt sent to the wrong APIC ID.    After a lot of digging around, I found that Linux defaults to a flat (rather than clustered) logical mode behavior for systems with <= 8 CPUs.    The c-bhyve runtime was not seeing this behavior because the bhyve-specific UEFI ROM has @FORCE_APIC_PHYSICAL_DESTINATION_MODE@ set in the flags section of its provided @FADT@ ACPI data.    This lead me back to @lapic_intr_msi@ to see why the logical-flat addressing was not causing the interrupt to be properly delivered. 

 Its logic for determine the mode (logical or physical) of the MSI was immediately suspect: 
 <pre><code class="c"> 
          * Extract the x86-specific fields from the MSI addr/msg 
          * params according to the Intel Arch spec, Vol3 Ch 10. 
          * The PCI specification does not support level triggered 
          * MSI/MSI-X so ignore trigger level in 'msg'. 
          * The 'dest' is interpreted as a logical APIC ID if both 
          * the Redirection Hint and Destination Mode are '1' and 
          * physical otherwise. 
         dest = (addr >> 12) & 0xff; 
         phys = ((addr & (MSI_X86_ADDR_RH | MSI_X86_ADDR_LOG)) != 
             (MSI_X86_ADDR_RH | MSI_X86_ADDR_LOG)); 
         delmode = msg & APIC_DELMODE_MASK; 
         vec = msg & 0xff; 

 Even the comment mentions that logical mode is "enabled" only if the redirection hint is set.    The Intel manuals are not exactly straightforward on the subject.    They describe some restrictions on broadcast-targeted interrupts when the redirection-hint bit is set.    Their description of behavior when the hint bit is absent is a little confusing though: 

 > f RH is 0, then the DM bit is ignored and the message is sent ahead independent of whether the physical or logical destination mode is used. 

 That suggests the delivery-mode (logical vs physical) is ignored without the redirect hint, but Linux is clearly counting on logical delivery without the RH bit.    Some "commentary": commentary I found on LKML added useful  

 > > If RH is 0, then the DM bit is ignored and the message is sent ahead 
 > > independent of whether the physical or logical destination mode is 
 > > used. 
 > The PCI device simply writes the message data to that address, it does not 
 > even know what the individual bits mean. It's a write of data to address. 
 > The write gets then directed to the APIC bus or the Processor System Bus 
 > depending on the CPU by a translation unit. The translated message which 
 > goes on the bus to which the APIC(s) are connected contains the DM bit 
 > which is always evaluated by the local APICs for matching. 

 While there is probably more logic we can attach to the redirect-hint bit, we should probably decouple it from the physical/logical mode determination for now.