Bug #6725
closedPCI(-X) Express Fatal Error (0x40)
0%
Description
Last week I got this error many times. Unfortunately I can't save any dump from this panic.
but I did a photo of the screen with the mdb, see the attach
The system is the latest SmartOS booted from platform-20160304T005100Z.tgz
I checked the code from illumos and smartos of this point and it is identical
also I attach prtconf and prtdiag of my server, I hope it will help you
the point:
uts/common/io/pciex/pcieb.c:1428
Also I tried to disable pcie_disable_scan via mdb, but the system just stopped most of io operations
pcie_disable_scan/D
pcie_disable_scan:
pcie_disable_scan: 0
pcie_disable_scan/W 1
pcie_disable_scan: 0 = 0x1
$q
[root@node01 ~]# pcitool /pci@0,0 -i all
0x3,0x30: pcieb 1 /pci@0,0/pci8086,2f0a@3,2
0x5,0x40: ahci 0 /pci@0,0/pci1849,8d62@11,4
0x6,0x41: mpt_sas 0 /pci@0,0/pci8086,2f02@1/pci1849,97@0
0x7,0x42: ahci 1 /pci@0,0/pci1849,8d02@1f,2
0x8,0x60: igb 0 /pci@0,0/pci8086,2f0a@3,2/pci1849,1521@0
0x9,0x61: igb 0 /pci@0,0/pci8086,2f0a@3,2/pci1849,1521@0
0xa,0x62: igb 1 /pci@0,0/pci8086,2f0a@3,2/pci1849,1521@0,1
0xb,0x63: igb 1 /pci@0,0/pci8086,2f0a@3,2/pci1849,1521@0,1
0x2,0x82: pcieb 0 /pci@0,0/pci8086,2f02@1
0x4,0x83: ehci 1 /pci@0,0/pci1849,8d26@1d
0x4,0x83: ehci 0 /pci@0,0/pci1849,8d2d@1a
[root@node01 ~]# echo '::interrupts -d' | mdb -k
IRQ Vect IPL Bus Trg Type CPU Share APIC/INT# Driver Name(s)
9 0x81 9 PCI Lvl Fixed 1 1 0x0/0x9 acpi_wrapper_isr
18 0x83 9 PCI Lvl Fixed 4 2 0x0/0x12 ehci#1, ehci#0
32 0x20 2 Edg IPI all 1 - cmi_cmci_trap
72 0x82 7 PCI Edg MSI 2 1 - pcieb#0
73 0x30 4 PCI Edg MSI 3 1 - pcieb#1
74 0x40 5 PCI Edg MSI 5 1 - ahci#0
75 0x41 5 PCI Edg MSI 6 1 - mpt_sas#0
76 0x42 5 PCI Edg MSI 7 1 - ahci#1
77 0x60 6 PCI Edg MSI-X 8 1 - igb#0
78 0x61 6 PCI Edg MSI-X 9 1 - igb#0
79 0x62 6 PCI Edg MSI-X 10 1 - igb#1
80 0x63 6 PCI Edg MSI-X 11 1 - igb#1
160 0xa0 0 Edg IPI all 0 - poke_cpu
208 0xd0 14 Edg IPI all 1 - kcpc_hw_overflow_intr
209 0xd1 14 Edg IPI all 1 - cbe_fire
210 0xd3 14 Edg IPI all 1 - cbe_fire
240 0xe0 15 Edg IPI all 1 - xc_serv
241 0xe1 15 Edg IPI all 1 - apic_error_intr
And the last message:
SUNW-MSG-ID: PCIEX-8000-0A, TYPE: Fault, VER: 1, SEVERITY: Critical
EVENT-TIME: Sat Mar 5 16:54:44 UTC 2016
PLATFORM: Aquarius-Srv-T50-D26, CSN: -, HOSTNAME: node01
SOURCE: eft, REV: 1.16
EVENT-ID: 284e6f76-d071-cf6e-fa4b-d45200d32687
DESC: A problem was detected for a PCIEX device.
Refer to http://illumos.org/msg/PCIEX-8000-0A for more information.
AUTO-RESPONSE: One or more device instances may be disabled
IMPACT: Loss of services provided by the device instances associated with this fault
REC-ACTION: Schedule a repair procedure to replace the affected device. Use fmadm faulty to identify the device or contact Sun for support.
by the way, zpool does not work, I can't do fmadm faulty
copy of the photo also is available here: http://witch.tambov.ru/~denis/panic.jpg
Files
Updated by Denis Kozadaev over 7 years ago
Ok, guys, it was a pure hardware error in the storage system
when lsi controller lost all disks, this controller is the only device on this bus,
so, you can reject this issue and/or describe it in some faq.
Updated by Garrett D'Amore over 7 years ago
- Status changed from New to Closed
Hard fault, not a bug, per submitter.