Project

General

Profile

Actions

Bug #6725

closed

PCI(-X) Express Fatal Error (0x40)

Added by Denis Kozadaev about 5 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2016-03-05
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

Last week I got this error many times. Unfortunately I can't save any dump from this panic.
but I did a photo of the screen with the mdb, see the attach

The system is the latest SmartOS booted from platform-20160304T005100Z.tgz
I checked the code from illumos and smartos of this point and it is identical
also I attach prtconf and prtdiag of my server, I hope it will help you

the point:
uts/common/io/pciex/pcieb.c:1428

Also I tried to disable pcie_disable_scan via mdb, but the system just stopped most of io operations

pcie_disable_scan/D

pcie_disable_scan:
pcie_disable_scan: 0

pcie_disable_scan/W 1

pcie_disable_scan: 0 = 0x1

$q

[root@node01 ~]# pcitool /pci@0,0 -i all
0x3,0x30: pcieb 1 /pci@0,0/pci8086,2f0a@3,2
0x5,0x40: ahci 0 /pci@0,0/pci1849,8d62@11,4
0x6,0x41: mpt_sas 0 /pci@0,0/pci8086,2f02@1/pci1849,97@0
0x7,0x42: ahci 1 /pci@0,0/pci1849,8d02@1f,2
0x8,0x60: igb 0 /pci@0,0/pci8086,2f0a@3,2/pci1849,1521@0
0x9,0x61: igb 0 /pci@0,0/pci8086,2f0a@3,2/pci1849,1521@0
0xa,0x62: igb 1 /pci@0,0/pci8086,2f0a@3,2/pci1849,1521@0,1
0xb,0x63: igb 1 /pci@0,0/pci8086,2f0a@3,2/pci1849,1521@0,1
0x2,0x82: pcieb 0 /pci@0,0/pci8086,2f02@1
0x4,0x83: ehci 1 /pci@0,0/pci1849,8d26@1d
0x4,0x83: ehci 0 /pci@0,0/pci1849,8d2d@1a

[root@node01 ~]# echo '::interrupts -d' | mdb -k
IRQ Vect IPL Bus Trg Type CPU Share APIC/INT# Driver Name(s)
9 0x81 9 PCI Lvl Fixed 1 1 0x0/0x9 acpi_wrapper_isr
18 0x83 9 PCI Lvl Fixed 4 2 0x0/0x12 ehci#1, ehci#0
32 0x20 2 Edg IPI all 1 - cmi_cmci_trap
72 0x82 7 PCI Edg MSI 2 1 - pcieb#0
73 0x30 4 PCI Edg MSI 3 1 - pcieb#1
74 0x40 5 PCI Edg MSI 5 1 - ahci#0
75 0x41 5 PCI Edg MSI 6 1 - mpt_sas#0
76 0x42 5 PCI Edg MSI 7 1 - ahci#1
77 0x60 6 PCI Edg MSI-X 8 1 - igb#0
78 0x61 6 PCI Edg MSI-X 9 1 - igb#0
79 0x62 6 PCI Edg MSI-X 10 1 - igb#1
80 0x63 6 PCI Edg MSI-X 11 1 - igb#1
160 0xa0 0 Edg IPI all 0 - poke_cpu
208 0xd0 14 Edg IPI all 1 - kcpc_hw_overflow_intr
209 0xd1 14 Edg IPI all 1 - cbe_fire
210 0xd3 14 Edg IPI all 1 - cbe_fire
240 0xe0 15 Edg IPI all 1 - xc_serv
241 0xe1 15 Edg IPI all 1 - apic_error_intr

And the last message:
SUNW-MSG-ID: PCIEX-8000-0A, TYPE: Fault, VER: 1, SEVERITY: Critical
EVENT-TIME: Sat Mar 5 16:54:44 UTC 2016
PLATFORM: Aquarius-Srv-T50-D26, CSN: -, HOSTNAME: node01
SOURCE: eft, REV: 1.16
EVENT-ID: 284e6f76-d071-cf6e-fa4b-d45200d32687
DESC: A problem was detected for a PCIEX device.
Refer to http://illumos.org/msg/PCIEX-8000-0A for more information.
AUTO-RESPONSE: One or more device instances may be disabled
IMPACT: Loss of services provided by the device instances associated with this fault
REC-ACTION: Schedule a repair procedure to replace the affected device. Use fmadm faulty to identify the device or contact Sun for support.

by the way, zpool does not work, I can't do fmadm faulty
copy of the photo also is available here: http://witch.tambov.ru/~denis/panic.jpg


Files

prtconf.log (415 KB) prtconf.log Denis Kozadaev, 2016-03-05 05:16 PM
prtdiag.log (1.27 KB) prtdiag.log Denis Kozadaev, 2016-03-05 05:17 PM
Actions #1

Updated by Denis Kozadaev about 5 years ago

Ok, guys, it was a pure hardware error in the storage system
when lsi controller lost all disks, this controller is the only device on this bus,
so, you can reject this issue and/or describe it in some faq.

Actions #2

Updated by Garrett D'Amore about 5 years ago

  • Status changed from New to Closed

Hard fault, not a bug, per submitter.

Actions

Also available in: Atom PDF