Project

General

Profile

Bug #9706

/hipster: disconnected (pci-e) network card causes OI crash

Added by Predrag Zečević 11 months ago. Updated 11 months ago.

Status:
New
Priority:
High
Assignee:
-
Category:
-
Target version:
-
Start date:
2018-08-07
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Hi all,

Not sure if this is illumos bug or not, but since I cannot open issue there, here we are...
2 years ago I have assembled PC, but due to lack of Intel Ethernet card support, I have installed pci-e card (was supported at that time - here excerpt from scpanpci -v):

pci bus 0x0005 cardnum 0x00 function 0x00: vendor 0x10ec device 0x8168
 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
 CardVendor 0x10ec card 0x8168 (Realtek Semiconductor Co., Ltd., RTL8111/8168 PCI Express Gigabit Ethernet controller)
  STATUS    0x0010  COMMAND 0x0047
  CLASS     0x02 0x00 0x00  REVISION 0x02
  BIST      0x00  HEADER 0x00  LATENCY 0x00  CACHE 0x00
  BASE0     0x0000c000 SIZE 256  I/O
  BASE2     0x00000000df100000 SIZE 4096  MEM64
  BASE4     0x00000000d0000000 SIZE 65536  MEM64 PREFETCHABLE
  MAX_LAT   0x00  MIN_GNT 0x00  INT_PIN 0x01  INT_LINE 0x0b

Intel (onboard) network card:
pci bus 0x0000 cardnum 0x1f function 0x06: vendor 0x8086 device 0x15b8
 Intel Corporation Ethernet Connection (2) I219-V
 CardVendor 0x1849 card 0x15b8 (ASRock Incorporation, Card unknown)
  STATUS    0x0010  COMMAND 0x0146
  CLASS     0x02 0x00 0x00  REVISION 0x31
  BIST      0x00  HEADER 0x00  LATENCY 0x00  CACHE 0x00
  BASE0     0xdf300000 SIZE 131072  MEM
  MAX_LAT   0x00  MIN_GNT 0x00  INT_PIN 0x01  INT_LINE 0x0b

When I plug-in in cable into e1000g (Intel), idle rge crashes system:
$ grep pci8086,a11b /etc/path_to_inst
"/pci@0,0/pci8086,a11b@1d,3" 3 "pcieb" 
"/pci@0,0/pci8086,a11b@1d,3/pci10ec,8168@0" 0 "rge" 

Card ID was found in /var/adm/merssages file:
[2018-08-07 08:44:21] solarix genunix: [ID 843051 kern.info] NOTICE: SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major#012
[2018-08-07 08:44:21] solarix unix: [ID 836849 kern.notice] #012#015panic[cpu0]/thread=ffffff000f2cbc40: 
[2018-08-07 08:44:21] solarix genunix: [ID 647700 kern.notice] pcieb-3: PCI(-X) Express Fatal Error. (0x43)
[2018-08-07 08:44:21] solarix unix: [ID 100000 kern.notice] #012
[2018-08-07 08:44:21] solarix genunix: [ID 655072 kern.notice] ffffff000f2cbb80 pcieb:pcieb_intr_handler+1c9 ()
[2018-08-07 08:44:21] solarix genunix: [ID 655072 kern.notice] ffffff000f2cbbf0 apix:apix_dispatch_pending_autovect+101 ()
[2018-08-07 08:44:21] solarix genunix: [ID 655072 kern.notice] ffffff000f2cbc20 apix:apix_dispatch_pending_hardint+34 ()
[2018-08-07 08:44:21] solarix genunix: [ID 655072 kern.notice] ffffff000f205a20 unix:switch_sp_and_call+13 ()
[2018-08-07 08:44:21] solarix genunix: [ID 655072 kern.notice] ffffff000f205a80 apix:apix_do_interrupt+359 ()
[2018-08-07 08:44:21] solarix genunix: [ID 655072 kern.notice] ffffff000f205a90 unix:cmnint+ba ()
[2018-08-07 08:44:21] solarix genunix: [ID 655072 kern.notice] ffffff000f205b80 unix:i86_mwait+d ()
[2018-08-07 08:44:21] solarix genunix: [ID 655072 kern.notice] ffffff000f205bc0 unix:cpu_idle_mwait+109 ()
[2018-08-07 08:44:21] solarix genunix: [ID 655072 kern.notice] ffffff000f205bf0 unix:cpu_acpi_idle+81 ()
[2018-08-07 08:44:21] solarix genunix: [ID 655072 kern.notice] ffffff000f205c00 unix:cpu_idle_adaptive+13 ()
[2018-08-07 08:44:21] solarix genunix: [ID 655072 kern.notice] ffffff000f205c20 unix:idle+a7 ()
[2018-08-07 08:44:21] solarix genunix: [ID 655072 kern.notice] ffffff000f205c30 unix:thread_start+8 ()
[2018-08-07 08:44:21] solarix unix: [ID 100000 kern.notice] 
[2018-08-07 08:44:21] solarix genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel

I can provide dump file (if required, 134MB compressed)....

So far, I have tested:
a) connected rge (idle e1000g) - no problem
b) connected e1000g (idle rge) - system crashes

I know, I can remove rge card, or I can connect both cards - but that is no solution.

Please, advice if you need more information.
Regards.

History

#1

Updated by Predrag Zečević 11 months ago

Actually, not sure if I have detected problem properly:

$ grep -E "2018-08-07.*pci8086,a11b" /var/adm/messages 
[2018-08-07 08:44:21] solarix npe: [ID 236367 kern.info] PCI Express-device: pci8086,a11b@1d,3, pcieb3
[2018-08-07 08:44:21] solarix genunix: [ID 936769 kern.info] pcieb3 is /pci@0,0/pci8086,a11b@1d,3
[2018-08-07 08:44:21] solarix npe: [ID 236367 kern.info] PCI Express-device: pci8086,a11b@1d,3, pcieb3
[2018-08-07 08:44:21] solarix genunix: [ID 936769 kern.info] pcieb3 is /pci@0,0/pci8086,a11b@1d,3

I guess that pcieb3 = pcieb-3 ?

Also available in: Atom PDF