Project

General

Profile

Actions

Bug #13108

closed

mlxcx fails to attach on system using pcplusmp after 12988

Added by Andy Fiddaman almost 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
driver - device drivers
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

After updating to a driver which included the bits from 12988, an OmniOS user reported that the mlxcx driver was no longer attaching:

Sep  3 13:38:17 err pcplusmp: [ID 805372 kern.info] pcplusmp: pciex15b3,1013 (mlxcx) instance 1 irq 0x2d vector 0x64 ioapic 0xff intin 0xff is bound to cpu 42
Sep  3 13:38:17 err pcplusmp: [ID 805372 kern.info] pcplusmp: pciex15b3,1013 (mlxcx) instance 1 irq 0x2e vector 0x65 ioapic 0xff intin 0xff is bound to cpu 43
Sep  3 13:38:17 err mlxcx: [ID 439861 kern.warning] WARNING: mlxcx1: Failed to set interrupt priority to 7 for async interrupt vector
Sep  3 13:38:17 err pcplusmp: [ID 805372 kern.info] pcplusmp: pciex15b3,1013 (mlxcx) instance 0 irq 0x2d vector 0x64 ioapic 0xff intin 0xff is bound to cpu 44
Sep  3 13:38:17 err pcplusmp: [ID 805372 kern.info] pcplusmp: pciex15b3,1013 (mlxcx) instance 0 irq 0x2e vector 0x65 ioapic 0xff intin 0xff is bound to cpu 45
Sep  3 13:38:17 err mlxcx: [ID 439861 kern.warning] WARNING: mlxcx0: Failed to set interrupt priority to 7 for async interrupt vector
Sep  3 13:38:18 err pcplusmp: [ID 805372 kern.info] pcplusmp: pciex15b3,1013 (mlxcx) instance 0 irq 0x2d vector 0x64 ioapic 0xff intin 0xff is bound to cpu 46
Sep  3 13:38:18 err pcplusmp: [ID 805372 kern.info] pcplusmp: pciex15b3,1013 (mlxcx) instance 0 irq 0x2e vector 0x65 ioapic 0xff intin 0xff is bound to cpu 47
Sep  3 13:38:18 err mlxcx: [ID 439861 kern.warning] WARNING: mlxcx0: Failed to set interrupt priority to 7 for async interrupt vector
Sep  3 13:38:18 err pcplusmp: [ID 805372 kern.info] pcplusmp: pciex15b3,1013 (mlxcx) instance 1 irq 0x2d vector 0x64 ioapic 0xff intin 0xff is bound to cpu 0
Sep  3 13:38:18 err pcplusmp: [ID 805372 kern.info] pcplusmp: pciex15b3,1013 (mlxcx) instance 1 irq 0x2e vector 0x65 ioapic 0xff intin 0xff is bound to cpu 1
Sep  3 13:38:18 err mlxcx: [ID 439861 kern.warning] WARNING: mlxcx1: Failed to set interrupt priority to 7 for async interrupt vector
Sep  3 13:38:18 err pcplusmp: [ID 805372 kern.info] pcplusmp: pciex15b3,1013 (mlxcx) instance 0 irq 0x2d vector 0x64 ioapic 0xff intin 0xff is bound to cpu 2
Sep  3 13:38:18 err pcplusmp: [ID 805372 kern.info] pcplusmp: pciex15b3,1013 (mlxcx) instance 0 irq 0x2e vector 0x65 ioapic 0xff intin 0xff is bound to cpu 3
Sep  3 13:38:18 err pcplusmp: [ID 805372 kern.info] pcplusmp: pciex15b3,1013 (mlxcx) instance 1 irq 0x2f vector 0x66 ioapic 0xff intin 0xff is bound to cpu 4
Sep  3 13:38:18 err pcplusmp: [ID 805372 kern.info] pcplusmp: pciex15b3,1013 (mlxcx) instance 1 irq 0x30 vector 0x67 ioapic 0xff intin 0xff is bound to cpu 5
Sep  3 13:38:18 err mlxcx: [ID 439861 kern.warning] WARNING: mlxcx0: Failed to set interrupt priority to 7 for async interrupt vector
Sep  3 13:38:18 err mlxcx: [ID 439861 kern.warning] WARNING: mlxcx1: Failed to set interrupt priority to 7 for async interrupt vector

A dtrace run tracing calls under ddi_intr_set_pri() shows that apic_navail_vector() is returning 0.

 21    -> i_ddi_intr_ops
 21      -> pcieb_intr_ops
 21        -> pcieb_plat_intr_ops
 21          -> i_ddi_intr_ops
 21            -> npe_intr_ops
 21              -> pci_common_intr_ops
 21                -> pci_intx_get_ispec
 21                  -> ddi_get_parent_data
 21                  <- ddi_get_parent_data      -2449073009408
 21                <- pci_intx_get_ispec         -2449073009360
 21                -> apic_intr_ops
 21                  -> apic_alloc_msix_vectors
 21                    -> apic_navail_vector
 21                    <- apic_navail_vector                  0
 21                  <- apic_alloc_msix_vectors                 0
 21                <- apic_intr_ops                  4294967295
 21              <- pci_common_intr_ops              4294967295
 21            <- npe_intr_ops                       4294967295
 21          <- i_ddi_intr_ops                       4294967295
 21        <- pcieb_plat_intr_ops                    4294967295
 21      <- pcieb_intr_ops                           4294967295
 21    <- i_ddi_intr_ops                             4294967295
 21  <- ddi_intr_set_pri                      Return 4294967295 @ 6e

Interrupt allocation for this system (when booted to the previous kernel) shows:

root@err:~# mdb -ke ::interrupts
IRQ  Vect IPL Bus    Trg Type   CPU Share APIC/INT# ISR(s)
9    0x81 9   PCI    Lvl Fixed  1   1     0x0/0x9   acpi_wrapper_isr
16   0x32 4   PCI    Edg MSI    8   1     -         pcieb_intr_handler
17   0x85 7   PCI    Edg MSI    9   1     -         pcieb_intr_handler
18   0x33 4   PCI    Edg MSI    10  1     -         pcieb_intr_handler
19   0x86 7   PCI    Edg MSI    11  1     -         pcieb_intr_handler
20   0x34 4   PCI    Edg MSI    12  1     -         pcieb_intr_handler
21   0x87 7   PCI    Edg MSI    13  1     -         pcieb_intr_handler
22   0x35 4   PCI    Edg MSI    14  1     -         pcieb_intr_handler
23   0x88 7   PCI    Edg MSI    15  1     -         pcieb_intr_handler
24   0x36 4   PCI    Edg MSI    16  1     -         pcieb_intr_handler
25   0x89 7   PCI    Edg MSI    17  1     -         pcieb_intr_handler
26   0x8a 9   PCI    Edg MSI-X  18  1     -         xhci_intr
27   0x8b 9   PCI    Edg MSI-X  19  1     -         xhci_intr
28   0x37 4   PCI    Edg MSI    20  1     -         pcieb_intr_handler
29   0x8c 7   PCI    Edg MSI    21  1     -         pcieb_intr_handler
30   0x38 4   PCI    Edg MSI    22  1     -         pcieb_intr_handler
31   0x8d 7   PCI    Edg MSI    23  1     -         pcieb_intr_handler
32   0x20 2          Edg IPI    all 1     -         cmi_cmci_trap
33   0x39 4   PCI    Edg MSI    24  1     -         pcieb_intr_handler
34   0x8e 7   PCI    Edg MSI    25  1     -         pcieb_intr_handler
35   0x3a 4   PCI    Edg MSI    26  1     -         pcieb_intr_handler
36   0x8f 7   PCI    Edg MSI    27  1     -         pcieb_intr_handler
37   0x3b 4   PCI    Edg MSI    28  1     -         pcieb_intr_handler
38   0x60 6   PCI    Edg MSI-X  29  1     -         mlxcx_intr_async
39   0x61 6   PCI    Edg MSI-X  30  1     -         mlxcx_intr_n
40   0x62 6   PCI    Edg MSI-X  31  1     -         mlxcx_intr_async
41   0x63 6   PCI    Edg MSI-X  32  1     -         mlxcx_intr_n
42   0x41 5   PCI    Edg MSI    33  1     -         mptsas_intr
43   0x64 6   PCI    Edg MSI-X  34  1     -         igb_intr_tx_other
44   0x65 6   PCI    Edg MSI-X  35  1     -         igb_intr_rx
45   0x66 6   PCI    Edg MSI-X  36  1     -         igb_intr_tx_other
46   0x67 6   PCI    Edg MSI-X  37  1     -         igb_intr_rx
47   0x42 5   PCI    Edg MSI    40  1     -         ahci_intr
48   0x43 5   PCI    Edg MSI    41  1     -         ahci_intr
160  0xa0 0          Edg IPI    all 0     -         pir_ipi
161  0xa1 0          Edg IPI    all 0     -         poke_cpu
208  0xd0 14         Edg IPI    all 1     -         kcpc_hw_overflow_intr
209  0xd1 14         Edg IPI    all 1     -         cbe_fire
210  0xd3 14         Edg IPI    all 1     -         cbe_fire
240  0xe0 15         Edg IPI    all 1     -         xc_serv
241  0xe1 15         Edg IPI    all 1     -         apic_error_intr
248  0x82 7   PCI    Edg MSI    2   1     -         pcieb_intr_handler
249  0x40 5   PCI    Edg MSI    3   1     -         mptsas_intr
250  0x30 4   PCI    Edg MSI    4   1     -         pcieb_intr_handler
251  0x83 7   PCI    Edg MSI    5   1     -         pcieb_intr_handlersetspl:
252  0x31 4   PCI    Edg MSI    6   1     -         pcieb_intr_handler
253  0x84 7   PCI    Edg MSI    7   1     -         pcieb_intr_handler
root@err:~# mdb -ke ::interrupts | awk '{print $3}' | sort | uniq -c
   2 0
   3 14
   2 15
   1 2
  12 4
   4 5
   8 6
  12 7
   3 9

There are 12 entries at IPL 7.

Setting apix:apix_hw_chk_enable=0 in /etc/system has enabled the system to boot with the mlxcx driver attached.


Related issues

Related to illumos gate - Bug #12988: potential hang in mlxcx when async and ring vectors end up on same CPUClosedPaul Winder

Actions
Actions

Also available in: Atom PDF