Project

General

Profile

Bug #7781

hipster boot hang on AMD Sun X4200M2

Added by Carsten Grzemba almost 3 years ago. Updated about 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Drivers
Target version:
-
Start date:
2017-01-18
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

with latest hipster boot hang on SUN FIRE X4200 M2
booting with kmdb shows it seems to hang at load rts driver

load 'socketmod/rts' id 157 loaded 0xfffffffff7d3aae0/0xffffffffc0199bc0 size
1184/488
installing rts, module id 157.
[3]> ::status
debugging live kernel (64-bit) on (not set)
operating system: 5.11 illumos-f956687 (i86pc)
image uuid: (not set)
CPU-specific support: AMD
DTrace state: inactive
stopped on: debugger entry trap
[3]> ::where
kmdb: invalid command '::where': unknown dcmd name
[3]> 0::switch
[0]> $c
xc_serv+0x109(0, 0)
av_dispatch_autovect+0x91(f0)
dispatch_hilevel+0x20(f0, 0)
switch_sp_and_call+0x13()
do_interrupt+0x120(ffffff000f405ad0, 0)
_interrupt+0xba()
mach_cpu_idle+6()
cpu_idle+0x11a()
cpu_idle_adaptive+0x13()
idle+0xa7()
thread_start+8()
[0]> 3::switch
[3]> $c
kmdb_enter+0xb()
debug_enter+0x59(0)
abort_sequence_enter+0x41(0)
asy`async_rxint+0xdf(ffffff03d4a267e8, f9)
asy`asyintr+0x187(ffffff03d4a267e8)
av_dispatch_autovect+0x91(4)
dispatch_hilevel+0x20(4, 0)
switch_sp_and_call+0x13()
do_interrupt+0x120(ffffff000f7bbad0, 3)
_interrupt+0xba()
mach_cpu_idle+6()
cpu_idle+0x11a()
cpu_idle_adaptive+0x13()
idle+0xa7()
thread_start+8()
[3]> 1::switch
[1]> $c
xc_serv+0x109(0, 0)
av_dispatch_autovect+0x91(f0)
dispatch_hilevel+0x20(f0, 0)
switch_sp_and_call+0x13()
do_interrupt+0x120(ffffff000f6c2650, 1)
_interrupt+0xba()
ddi_get_soft_state+1()
ddi_ctlops+0x47(ffffff03d5efc7f8, ffffff03d5efc7f8, 1a, ffffff000f6c2820, 0)
i_ddi_caut_getput_ctlops+0x4e(ffffff03dbf8a780, ffffff000f6c28ac, 1, 4, 1, 0)
i_ddi_caut_get32+0x31(ffffff03dbf8a780, 1)
pci_config_get32+0x32(ffffff03dbf8a780, 1)
pci_xcap_locate+0x74(ffffff03dbf8a780, 3, ffffff000f6c295e)
pcieb`pcieb_id_props+0x9b(ffffff03dc44e3d8)
pcieb`pcieb_attach+0x138(ffffff03d5efc7f8, 0)
devi_attach+0x92(ffffff03d5efc7f8, 0)
attach_node+0xa7(ffffff03d5efc7f8)
i_ndi_config_node+0x7d(ffffff03d5efc7f8, 6, 0)
i_ddi_attachchild+0x48(ffffff03d5efc7f8)
i_ddi_attach_node_hierarchy+0x6c(ffffff03d5efc7f8)
attach_driver_nodes+0xb2(4c)
ddi_hold_installed_driver+0x130(4c)
attach_drivers+0x4c()
thread_start+8()
[1]> 2::switch
[2]> $c
xc_serv+0x109(0, 0)
av_dispatch_autovect+0x91(f0)
dispatch_hilevel+0x20(f0, 0)
switch_sp_and_call+0x13()
do_interrupt+0x120(ffffff000f740ad0, 2)
_interrupt+0xba()
mach_cpu_idle+6()
cpu_idle+0x11a()
cpu_idle_adaptive+0x13()
idle+0xa7()
thread_start+8()
@

History

#1

Updated by Carsten Grzemba almost 3 years ago

The stack means it is hanging in pcieb, so I replaced pcieb driver with the last working bin, than pcieb hangs in npe functions so I replaced npe also and order loading drivers in name_to_major load npe before pcieb.
Than it boots ;)

#2

Updated by Alexander Pyhalov almost 3 years ago

Just to check. Does latest means 2016.10 or 'latest with January illumos bits'?

#3

Updated by Carsten Grzemba almost 3 years ago

it installed is:

consolidation/osnet/osnet-incorporation (openindiana.org) 0.5.11-2017.0.0.16122 i--

#4

Updated by Carsten Grzemba almost 3 years ago

I updated again to and all works well.
So the root cause could be only the driver ordering in name_to_major.?

#5

Updated by Carsten Grzemba about 2 years ago

I got the boot problem on Sun X4200 again and checked my historical BE's. So the last working npe driver is from
  1. pkg -R /b info system/kernel/platform
    Name: system/kernel/platform
    Summary: Core Solaris Kernel Architecture
    Description: core kernel software for a specific hardware platform group
    Category: System/Core
    State: Installed
    Publisher: openindiana.org
    Version: 0.5.11
    Branch: 2015.0.2.15619
    Packaging Date: Sun Apr 17 14:13:31 2016
    Size: 11.41 MB
    FMRI: pkg::20160417T141331Z
  1. md5sum npe
    39c996e4b31187df7df1737051052adc npe
#6

Updated by Carsten Grzemba about 2 years ago

driver swapping has not realy changed anything. Deeper investigation shows, it hangs or loops on attach the pci_pci bridge for PCIEX Slot 3, there is installed a LSI22320SLE.

[1]::prtconf
--8<--
    ffffff03d60c1808 pciex_root_complex, instance #1 (driver name: npe)
        ffffff03d60c1560 pci10de,5e (driver not attached)
        ffffff03d60c12b8 pci10de,cb84 (driver not attached)
        ffffff03d60c1010 pci10de,57, instance #1 (driver not attached)
        ffffff03d60ffd48 pciexclass,060400 (driver not attached)
        ffffff03d60ffaa0 pciexclass,060400 (driver not attached)
        ffffff03d60ff7f8 pciexclass,060400, instance #0 (driver not attached)
            ffffff03d60fa008 pciexclass,060400 (driver not attached)
                ffffff03d60f6d58 pci1000,30 (driver not attached)
                ffffff03d60f6ab0 pci1000,30 (driver not attached)
--8<--

[1]> ffffff03d60ff7f8::devinfo -s
DEVINFO            MAJ           REFCNT NODENAME             NODESTATE
                  INST         CIRCULAR BINDNAME             STATE
                                 THREAD                      FLAGS          
ffffff03d60ff7f8    76                1 pci10de,5d@d         DS_PROBED
                     0                0 pciexclass,060400    <S_ATTACHING,S_NEED_RESET>
                                      0                      <>

Also available in: Atom PDF