Project

General

Profile

Bug #12393

kernel panics on X4470 M1 while trying to load cpudrv for cpu #9 when booting OI-hipster-text-20191106.usb

Added by Olaf Bohlen 3 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
kernel
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Hard
Tags:
Gerrit CR:

Description

Ok, I think I hitted a bug here:

Loading unix...
Loading /platform/i86pc/amd64/boot_archive...
Loading /platform/i86pc/amd64/boot_archive.hash...
Booting...�module /platform/i86pc/kernel/amd64/unix: text at [0xfffffffffb800000, 0xfffffffffb982bf9] data at 0xfffffffffbc00000
module /kernel/amd64/genunix: text at [0xfffffffffbcaa840, 0xfffffffffbf3bcf7] data at 0xfffffffffbf3bd00
Loading kmdb...
module /kernel/misc/amd64/kmdbmod: text at [0xfffffffffb982c00, 0xfffffffffba3e7c7] data at 0xfffffffffbfaf1c0
module /kernel/misc/amd64/ctf: text at [0xfffffffffba3e7e0, 0xfffffffffba4bc9f] data at 0xfffffffffbfe87a0

Welcome to kmdb
kmdb: unable to determine terminal type: assuming `vt100'
Loaded modules: [ unix krtld genunix ]
[0]> :c
SunOS Release 5.11 Version illumos-87d7b64204 64-bit
Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved.
x86_feature: lgpg
x86_feature: tsc
x86_feature: msr
x86_feature: mtrr
x86_feature: pge
x86_feature: de
x86_feature: cmov
x86_feature: mmx
x86_feature: mca
x86_feature: pae
x86_feature: cv8
x86_feature: pat
x86_feature: sep
x86_feature: sse
x86_feature: sse2
x86_feature: htt
x86_feature: asysc
x86_feature: nx
x86_feature: sse3
x86_feature: cx16
x86_feature: cmp
x86_feature: tscp
x86_feature: mwait
x86_feature: cpuid
x86_feature: ssse3
x86_feature: sse4_1
x86_feature: sse4_2
x86_feature: clfsh
x86_feature: 64
x86_feature: vmx
x86_feature: x2apic
x86_feature: ibrs
x86_feature: ibpb
x86_feature: stibp
x86_feature: ssbd
x86_feature: flush_cmd
x86_feature: core_thermal
mem = 201317220K (0x2fff6d9000)
Using default device instance data
SMBIOS v2.6 loaded (10504 bytes)initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 0 strand 0
root nexus = i86pc
pseudo0 at root
pseudo0 is /pseudo
scsi_vhci0 at root
scsi_vhci0 is /scsi_vhci
Reading Intel IOMMU boot options
npe0 at root: space 0 offset 0
npe0 is /pci@0,0
PCI Express-device: isa@1f, isa0
pseudo-device: acpippm0
acpippm0 is /pseudo/acpippm@0
pseudo-device: ppm0
ppm0 is /pseudo/ppm@0
ramdisk0 at root
ramdisk0 is /ramdisk
root on /ramdisk:a fstype ufs
acpinex0 at root
acpinex0 is /fw
acpinex: cpu@1, cpudrv0
/fw/cpu@1 (cpudrv0) online
pseudo-device: dld0
dld0 is /pseudo/dld@0
ISA-device: asy0
asy0 is /pci@0,0/isa@1f/asy@1,3f8
PCI Express-device: pci8086,3410@9, pcieb4
pcieb4 is /pci@0,0/pci8086,3410@9
PCI Express-device: pci8086,3a40@1c, pcieb5
pcieb5 is /pci@0,0/pci8086,3a40@1c
PCI Express-device: pci8086,3a48@1c,4, pcieb6
pcieb6 is /pci@0,0/pci8086,3a48@1c,4
PCI Express-device: pci8086,244e@1e, pci_pci0
pci_pci0 is /pci@0,0/pci8086,244e@1e
npe1 at root: space 78 offset 0
npe1 is /pci@78,0
PCI Express-device: pci8086,3420@0, pcieb7
pcieb7 is /pci@78,0/pci8086,3420@0
PCI Express-device: pci8086,340c@5, pcieb10
pcieb10 is /pci@78,0/pci8086,340c@5
PCI Express-device: pci108e,4848@1a,7, ehci0
ehci0 is /pci@0,0/pci108e,4848@1a,7
PCI Express-device: pci108e,4848@1d,7, ehci1
ehci1 is /pci@0,0/pci108e,4848@1d,7
PCI Express-device: pci108e,4848@1a, uhci0
uhci0 is /pci@0,0/pci108e,4848@1a
PCI Express-device: pci108e,4848@1a,1, uhci1
uhci1 is /pci@0,0/pci108e,4848@1a,1
PCI Express-device: pci108e,4848@1a,2, uhci2
uhci2 is /pci@0,0/pci108e,4848@1a,2
PCI Express-device: pci108e,4848@1d, uhci3
uhci3 is /pci@0,0/pci108e,4848@1d
PCI Express-device: pci108e,4848@1d,1, uhci4
uhci4 is /pci@0,0/pci108e,4848@1d,1
PCI Express-device: pci108e,4848@1d,2, uhci5
uhci5 is /pci@0,0/pci108e,4848@1d,2
cpu0: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu0: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
KPTI enabled (PCID not supported, INVPCID not supported)
initialized cpu module 'cpu.generic' on chip 0 core 1 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 1 strand 0
acpinex: cpu@2, cpudrv1
/fw/cpu@2 (cpudrv1) online
cpu1: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu1: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
cpu1 initialization complete - online
initialized cpu module 'cpu.generic' on chip 0 core 2 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 2 strand 0
acpinex: cpu@3, cpudrv2
pseudo-device: audio0
/fw/cpu@3 (cpudrv2) online
audio0 is /pseudo/audio@0
cpu2: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu2: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
cpu2 initialization complete - online
initialized cpu module 'cpu.generic' on chip 0 core 3 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 3 strand 0
acpinex: cpu@4, cpudrv3
/fw/cpu@4 (cpudrv3) online
cpu3: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu3: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
cpu3 initialization complete - online
initialized cpu module 'cpu.generic' on chip 0 core 8 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 8 strand 0
acpinex: cpu@5, cpudrv4
/fw/cpu@5 (cpudrv4) online
cpu4: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu4: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
cpu4 initialization complete - online
initialized cpu module 'cpu.generic' on chip 0 core 9 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 9 strand 0
USB 2.0 device (usb46b,ff01) operating at hi speed (USB 2.x) on USB 2.0 root hub: hub@3, hubd0 at bus address 2
acpinex: cpu@6, cpudrv5
USB 2.0 device (usb4b4,6560) operating at hi speed (USB 2.x) on USB 2.0 root hub: hub@3, hubd1 at bus address 2
/fw/cpu@6 (cpudrv5) online
American Megatrends Inc. Generic Hub
cpu5: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
USB 1.10 device (usb46b,ff10) operating at low speed (USB 1.x) on USB 1.10 root hub: device@1, usb_mid0 at bus address 2
cpu5: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
initialized cpu module 'cpu.generic' on chip 0 core 10 strand 0
cpu5 initialization complete - online
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 10 strand 0
hubd1 is /pci@0,0/pci108e,4848@1d,7/hub@3
acpinex: cpu@7, cpudrv6
hubd0 is /pci@0,0/pci108e,4848@1a,7/hub@3
/fw/cpu@7 (cpudrv6) online
American Megatrends Inc. Virtual Keyboard and Mouse
cpu6: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
/pci@0,0/pci108e,4848@1d,7/hub@3 (hubd1) online
cpu6: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
initialized cpu module 'cpu.generic' on chip 0 core 11 strand 0
cpu6 initialization complete - online
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 11 strand 0
/pci@0,0/pci108e,4848@1a,7/hub@3 (hubd0) online
acpinex: cpu@8, cpudrv7
usb_mid0 is /pci@0,0/pci108e,4848@1a,2/device@1
/fw/cpu@8 (cpudrv7) online
USB 2.0 device (usb90c,1000) operating at hi speed (USB 2.x) on USB 2.0 external hub: storage@2, scsa2usb0 at bus address 3
cpu7: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
/pci@0,0/pci108e,4848@1a,2/device@1 (usb_mid0) online
cpu7: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
initialized cpu module 'cpu.g
panic[cpu2]/thread=ffffff01708e6c20: BAD TRAP: type=e (#pf Page fault) rp=ffffff01708e6790 addr=0 occurred in module "<unknown>" due to a NULL pointer dereference

#pf Page fault
Bad kernel fault at addr=0x0
pid=0, pc=0x0, sp=0xffffff01708e6880, eflags=0x10286
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe>  cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: 0  cr3: e800000  cr8: 0

        rdi: ffffff4a3422d990 rsi:                0 rdx:                0
        rcx:                3  r8:          2060504  r9: ffffff01708e6820
        rax:                0 rbx: ffffff4a3422d990 rbp:                0
        r10: fffffffffb872f34 r11: ffffff4a614c3200 r12:                0
        r13: ffffff4a57054e18 r14: fffffffff7cb5600 r15: ffffff01708e6900
        fsb:        200000000 gsb: ffffff4a612d8000  ds:               38
         es:               38  fs:                0  gs:                0
        trp:                e err:               10 rip:                0
         cs:               30 rfl:            10286 rsp: ffffff01708e6880
         ss:               38

Warning - stack not written to the dump buffer
ffffff01708e6690 unix:die+c6 ()
ffffff01708e6780 unix:trap+11b1 ()
ffffff01708e6790 unix:cmntrap+e6 ()

panic: entering debugger (no dump device, continue to reboot)
Loaded modules: [ scsi_vhci mac uppc neti ufs s1394 apix uhci hook ip usba xhci specfs pcplusmp cpu.generic arp sockfs ]
kmdb: target stopped at:
kmdb_enter+0xb: movq   %rax,%rdi
[2]> 
[2]> ::cpuregs
%rax = 0x0000000000000202                 %r9  = 0x0000000000000000 
%rbx = 0xfffffffffb9633d0                 %r10 = 0xbaddcafebaddcafe 
%rcx = 0x000000000000000a                 %r11 = 0x0000000000000000 
%rdx = 0x00000000000003f8                 %r12 = 0x000000000000000e 
%rsi = 0x00000000000003f8                 %r13 = 0x0000000000000000 
%rdi = 0xfffffffffbf3be30     nullwrapper %r14 = 0xffffff01708e65a8 
%r8  = 0x0000000000000000                 %r15 = 0xfffffffffbc45080 p0

%rip = 0xfffffffffb87dccb kmdb_enter+0xb
%rbp = 0xfffffffffbc8e300
%rsp = 0xfffffffffbc8e300
%rflags = 0x00000002
  id=0 vip=0 vif=0 ac=0 vm=0 rf=0 nt=0 iopl=0x0
  status=<of,df,if,tf,sf,zf,af,pf,cf>

%cs = 0x0030    %ds = 0x0038    %es = 0x0038    %fs = 0x0000
%gs = 0x0000    %gsbase = 0xffffff4a612d8000    %kgsbase = 0x200000000
%trapno = 0x14  %err = 0x0      %cr2 = 0x0      %cr3 = 0xe800000

::prtconf shows:

11:34 < Agnar> ffffff4a570907f8 cpu, instance #7 (driver name: cpudrv)
11:34 < Agnar> ffffff4a57090550 cpu, instance #8 (driver not attached)

so I do think it panics while loading cpudrv.

I tried booting off from the current available Solaris 11.4 usb image and that works. So it seems not to be a hardware issue per se.

The CPUs in that system are 4sockets populated XEON X7550 @ 2GHz, 8 cores, HT enabled

I haven't found something comparable in the issue tracker and just want to document the bug here in the slight chance someone sees a solution for this.

History

#1

Updated by Olaf Bohlen 3 months ago

15:58 <@rmustacc> Agnar: The useful thing here will be to print the stack trace
again to get the args to trap and printing the struct regs so
we can figure out where we came from. ::cpuinfo might also
help.

#2

Updated by Olaf Bohlen 3 months ago

I hope this is what was requested:

Loading unix...
Loading /platform/i86pc/amd64/boot_archive...
Loading /platform/i86pc/amd64/boot_archive.hash...
Booting...�module /platform/i86pc/kernel/amd64/unix: text at [0xfffffffffb800000, 0xfffffffffb982bf9] data at 0xfffffffffbc00000
module /kernel/amd64/genunix: text at [0xfffffffffbcaa840, 0xfffffffffbf3bcf7] data at 0xfffffffffbf3bd00
Loading kmdb...
module /kernel/misc/amd64/kmdbmod: text at [0xfffffffffb982c00, 0xfffffffffba3e7c7] data at 0xfffffffffbfaf1c0
module /kernel/misc/amd64/ctf: text at [0xfffffffffba3e7e0, 0xfffffffffba4bc9f] data at 0xfffffffffbfe87a0

Welcome to kmdb
kmdb: unable to determine terminal type: assuming `vt100'
Loaded modules: [ unix krtld genunix ]
[0]> :c
SunOS Release 5.11 Version illumos-87d7b64204 64-bit
Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved.
x86_feature: lgpg
x86_feature: tsc
x86_feature: msr
x86_feature: mtrr
x86_feature: pge
x86_feature: de
x86_feature: cmov
x86_feature: mmx
x86_feature: mca
x86_feature: pae
x86_feature: cv8
x86_feature: pat
x86_feature: sep
x86_feature: sse
x86_feature: sse2
x86_feature: htt
x86_feature: asysc
x86_feature: nx
x86_feature: sse3
x86_feature: cx16
x86_feature: cmp
x86_feature: tscp
x86_feature: mwait
x86_feature: cpuid
x86_feature: ssse3
x86_feature: sse4_1
x86_feature: sse4_2
x86_feature: clfsh
x86_feature: 64
x86_feature: vmx
x86_feature: x2apic
x86_feature: ibrs
x86_feature: ibpb
x86_feature: stibp
x86_feature: ssbd
x86_feature: flush_cmd
x86_feature: core_thermal
mem = 201317220K (0x2fff6d9000)
Using default device instance data
SMBIOS v2.6 loaded (10504 bytes)initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 0 strand 0
root nexus = i86pc
pseudo0 at root
pseudo0 is /pseudo
scsi_vhci0 at root
scsi_vhci0 is /scsi_vhci
Reading Intel IOMMU boot options
npe0 at root: space 0 offset 0
npe0 is /pci@0,0
PCI Express-device: isa@1f, isa0
pseudo-device: acpippm0
acpippm0 is /pseudo/acpippm@0
pseudo-device: ppm0
ppm0 is /pseudo/ppm@0
ramdisk0 at root
ramdisk0 is /ramdisk
root on /ramdisk:a fstype ufs
acpinex0 at root
acpinex0 is /fw
acpinex: cpu@1, cpudrv0
/fw/cpu@1 (cpudrv0) online
pseudo-device: dld0
dld0 is /pseudo/dld@0
ISA-device: asy0
asy0 is /pci@0,0/isa@1f/asy@1,3f8
PCI Express-device: pci8086,3410@9, pcieb4
pcieb4 is /pci@0,0/pci8086,3410@9
PCI Express-device: pci8086,3a40@1c, pcieb5
pcieb5 is /pci@0,0/pci8086,3a40@1c
PCI Express-device: pci8086,3a48@1c,4, pcieb6
pcieb6 is /pci@0,0/pci8086,3a48@1c,4
PCI Express-device: pci8086,244e@1e, pci_pci0
pci_pci0 is /pci@0,0/pci8086,244e@1e
npe1 at root: space 78 offset 0
npe1 is /pci@78,0
PCI Express-device: pci8086,3420@0, pcieb7
pcieb7 is /pci@78,0/pci8086,3420@0
PCI Express-device: pci8086,340c@5, pcieb10
pcieb10 is /pci@78,0/pci8086,340c@5
PCI Express-device: pci108e,4848@1a,7, ehci0
ehci0 is /pci@0,0/pci108e,4848@1a,7
PCI Express-device: pci108e,4848@1d,7, ehci1
ehci1 is /pci@0,0/pci108e,4848@1d,7
PCI Express-device: pci108e,4848@1a, uhci0
uhci0 is /pci@0,0/pci108e,4848@1a
PCI Express-device: pci108e,4848@1a,1, uhci1
uhci1 is /pci@0,0/pci108e,4848@1a,1
PCI Express-device: pci108e,4848@1a,2, uhci2
uhci2 is /pci@0,0/pci108e,4848@1a,2
PCI Express-device: pci108e,4848@1d, uhci3
uhci3 is /pci@0,0/pci108e,4848@1d
PCI Express-device: pci108e,4848@1d,1, uhci4
uhci4 is /pci@0,0/pci108e,4848@1d,1
PCI Express-device: pci108e,4848@1d,2, uhci5
uhci5 is /pci@0,0/pci108e,4848@1d,2
cpu0: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu0: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
KPTI enabled (PCID not supported, INVPCID not supported)
initialized cpu module 'cpu.generic' on chip 0 core 1 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 1 strand 0
acpinex: cpu@2, cpudrv1
/fw/cpu@2 (cpudrv1) online
cpu1: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu1: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
cpu1 initialization complete - online
initialized cpu module 'cpu.generic' on chip 0 core 2 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 2 strand 0
acpinex: cpu@3, cpudrv2
pseudo-device: audio0
/fw/cpu@3 (cpudrv2) online
audio0 is /pseudo/audio@0
cpu2: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu2: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
cpu2 initialization complete - online
initialized cpu module 'cpu.generic' on chip 0 core 3 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 3 strand 0
acpinex: cpu@4, cpudrv3
/fw/cpu@4 (cpudrv3) online
cpu3: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu3: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
cpu3 initialization complete - online
initialized cpu module 'cpu.generic' on chip 0 core 8 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 8 strand 0
acpinex: cpu@5, cpudrv4
/fw/cpu@5 (cpudrv4) online
cpu4: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu4: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
cpu4 initialization complete - online
initialized cpu module 'cpu.generic' on chip 0 core 9 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 9 strand 0
acpinex: cpu@6, cpudrv5
/fw/cpu@6 (cpudrv5) online
cpu5: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu5: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
cpu5 initialization complete - online
initialized cpu module 'cpu.generic' on chip 0 core 10 strand 0
USB 1.10 device (usb46b,ff10) operating at low speed (USB 1.x) on USB 1.10 root hub: device@1, usb_mid0 at bus address 2
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 10 strand 0
USB 2.0 device (usb46b,ff01) operating at hi speed (USB 2.x) on USB 2.0 root hub: hub@3, hubd1 at bus address 2
acpinex: cpu@7, cpudrv6
USB 2.0 device (usb4b4,6560) operating at hi speed (USB 2.x) on USB 2.0 root hub: hub@3, hubd0 at bus address 2
/fw/cpu@7 (cpudrv6) online
American Megatrends Inc. Virtual Keyboard and Mouse
cpu6: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
WARNING: Device /fw/cpu@5 failed to power up.
cpu6: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
initialized cpu module 'cpu.generic' on chip 0 core 11 strand 0
cpu6 initialization complete - online
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 11 strand 0
American Megatrends Inc. Gen
epanic[cpu2]/thread=ffffff01708e6c20: BAD TRAP: type=e (#pf Page fault) rp=ffffff01708e6790 addr=0 occurred in module "<unknown>" due to a NULL pointer dereference

#pf Page fault
Bad kernel fault at addr=0x0
pid=0, pc=0x0, sp=0xffffff01708e6880, eflags=0x10286
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe>  cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: 0  cr3: e800000  cr8: 0

        rdi: ffffff4a3422d990 rsi:                0 rdx:                0
        rcx:                3  r8:          2060504  r9: ffffff01708e6820
        rax:                0 rbx: ffffff4a3422d990 rbp:                0
        r10: fffffffffb872f34 r11: ffffff4a612cb240 r12:                0
        r13: ffffff4a57054e18 r14: fffffffff7cb5600 r15: ffffff01708e6900
        fsb:        200000000 gsb: ffffff4a612dc000  ds:               38
         es:               38  fs:                0  gs:                0
        trp:                e err:               10 rip:                0
         cs:               30 rfl:            10286 rsp: ffffff01708e6880
         ss:               38

Warning - stack not written to the dump buffer
ffffff01708e6690 unix:die+c6 ()
ffffff01708e6780 unix:trap+11b1 ()
ffffff01708e6790 unix:cmntrap+e6 ()

panic: entering debugger (no dump device, continue to reboot)
Loaded modules: [ scsi_vhci mac uppc neti ufs s1394 apix uhci hook ip usba xhci specfs pcplusmp cpu.generic arp sockfs ]
kmdb: target stopped at:
kmdb_enter+0xb: movq   %rax,%rdi
[2]> ::cpuinfo
 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  0 fffffffffbc48000  1f    0    0  96   no    no t-34   fffffffffbc46040 
  1 ffffff4a612e4000  1f    0    0  60   no    no t-3    ffffff01703cdc20 
  2 fffffffffbc4dda0  1b    0    0  60   no    no t-34   ffffff01708e6c20 
  3 ffffff4a612d8000  1f    0    0  -1   no    no t-1    ffffff0170ac4c20 (idle)
  4 ffffff4a614cc000  1f    0    0  -1   no    no t-1    ffffff0170d62c20 (idle)
  5 ffffff4a614c6000  1f    0    0  -1   no    no t-29   ffffff017051bc20 (idle)
  6 ffffff4a6164c000  1f    0    0  -1   no    no t-16   ffffff0170cd2c20 (idle)
  7 ffffff4a614d4000  1f    0    0  -1   no    no t-3    ffffff0170ab2c20 (idle)
[2]> ::cpuregs
%rax = 0x0000000000000202                 %r9  = 0x0000000000000000 
%rbx = 0xfffffffffb9633d0                 %r10 = 0xbaddcafebaddcafe 
%rcx = 0x000000000000000a                 %r11 = 0x0000000000000000 
%rdx = 0x00000000000003f8                 %r12 = 0x000000000000000e 
%rsi = 0x00000000000003f8                 %r13 = 0x0000000000000000 
%rdi = 0xfffffffffbf3be30     nullwrapper %r14 = 0xffffff01708e65a8 
%r8  = 0x0000000000000000                 %r15 = 0xfffffffffbc45080 p0

%rip = 0xfffffffffb87dccb kmdb_enter+0xb
%rbp = 0xfffffffffbc8e300
%rsp = 0xfffffffffbc8e300
%rflags = 0x00000002
  id=0 vip=0 vif=0 ac=0 vm=0 rf=0 nt=0 iopl=0x0
  status=<of,df,if,tf,sf,zf,af,pf,cf>

%cs = 0x0030    %ds = 0x0038    %es = 0x0038    %fs = 0x0000
%gs = 0x0000    %gsbase = 0xffffff4a612dc000    %kgsbase = 0x200000000
%trapno = 0x14  %err = 0x0      %cr2 = 0x0      %cr3 = 0xe800000
[2]> $C
fffffffffbc8e300 kmdb_enter+0xb()
fffffffffbc8e330 debug_enter+0x75(fffffffffb9633d0)
fffffffffbc8e410 panicsys+0x606(fffffffffb961a60, ffffff01708e65a8, fffffffffbc8e420, 1)
ffffff01708e6590 vpanic+0x15c()
ffffff01708e6600 0xfffffffffb8a0371()
ffffff01708e6690 die+0xc6(e, ffffff01708e6790, 0, 2)
ffffff01708e6780 trap+0x11b1(ffffff01708e6790, 0, 2)
ffffff01708e6790 cmntrap_pushed+0x3c()
[2]> ::print struct regs ffffff01708e6790
kmdb: failed to find member ffffff01708e6790 of struct regs: no such member of structure or union
[2]> ::print struct regs 0xffffff01708e6790
kmdb: failed to find member 0xffffff01708e6790 of struct regs: no such member of structure or union
[2]> 0xffffff01708e6790::print struct regs 
{
    r_savfp = 0
    r_savpc = 0
    r_rdi = 0xffffff4a3422d990
    r_rsi = 0
    r_rdx = 0
    r_rcx = 0x3
    r_r8 = 0x2060504
    r_r9 = 0xffffff01708e6820
    r_rax = 0
    r_rbx = 0xffffff4a3422d990
    r_rbp = 0
    r_r10 = 0xfffffffffb872f34
    r_r11 = 0xffffff4a612cb240
    r_r12 = 0
    r_r13 = 0xffffff4a57054e18
    r_r14 = 0xfffffffff7cb5600
    r_r15 = 0xffffff01708e6900
    __r_fsbase = 0x4000000048
    __r_gsbase = 0x100000000
    r_ds = 0x38
    r_es = 0x38
    r_fs = 0
    r_gs = 0                          
    r_trapno = 0xe
    r_err = 0x10
    r_rip = 0
    r_cs = 0x30
    r_rfl = 0x10286
    r_rsp = 0xffffff01708e6880
    r_ss = 0x38
}
[2]> 

#3

Updated by Olaf Bohlen 3 months ago

Loading unix...
Loading /platform/i86pc/amd64/boot_archive...
Loading /platform/i86pc/amd64/boot_archive.hash...
Booting...�module /platform/i86pc/kernel/amd64/unix: text at [0xfffffffffb800000, 0xfffffffffb982bf9] data at 0xfffffffffbc00000
module /kernel/amd64/genunix: text at [0xfffffffffbcaa840, 0xfffffffffbf3bcf7] data at 0xfffffffffbf3bd00
Loading kmdb...
module /kernel/misc/amd64/kmdbmod: text at [0xfffffffffb982c00, 0xfffffffffba3e7c7] data at 0xfffffffffbfaf1c0
module /kernel/misc/amd64/ctf: text at [0xfffffffffba3e7e0, 0xfffffffffba4bc9f] data at 0xfffffffffbfe87a0

Welcome to kmdb
kmdb: unable to determine terminal type: assuming `vt100'
Loaded modules: [ unix krtld genunix ]
[0]> :c
SunOS Release 5.11 Version illumos-87d7b64204 64-bit
Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved.
x86_feature: lgpg
x86_feature: tsc
x86_feature: msr
x86_feature: mtrr
x86_feature: pge
x86_feature: de
x86_feature: cmov
x86_feature: mmx
x86_feature: mca
x86_feature: pae
x86_feature: cv8
x86_feature: pat
x86_feature: sep
x86_feature: sse
x86_feature: sse2
x86_feature: htt
x86_feature: asysc
x86_feature: nx
x86_feature: sse3
x86_feature: cx16
x86_feature: cmp
x86_feature: tscp
x86_feature: mwait
x86_feature: cpuid
x86_feature: ssse3
x86_feature: sse4_1
x86_feature: sse4_2
x86_feature: clfsh
x86_feature: 64
x86_feature: vmx
x86_feature: x2apic
x86_feature: ibrs
x86_feature: ibpb
x86_feature: stibp
x86_feature: ssbd
x86_feature: flush_cmd
x86_feature: core_thermal
mem = 201317220K (0x2fff6d9000)
Using default device instance data
SMBIOS v2.6 loaded (10504 bytes)initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 0 strand 0
root nexus = i86pc
pseudo0 at root
pseudo0 is /pseudo
scsi_vhci0 at root
scsi_vhci0 is /scsi_vhci
Reading Intel IOMMU boot options
npe0 at root: space 0 offset 0
npe0 is /pci@0,0
PCI Express-device: isa@1f, isa0
pseudo-device: acpippm0
acpippm0 is /pseudo/acpippm@0
pseudo-device: ppm0
ppm0 is /pseudo/ppm@0
ramdisk0 at root
ramdisk0 is /ramdisk
root on /ramdisk:a fstype ufs
acpinex0 at root
acpinex0 is /fw
acpinex: cpu@1, cpudrv0
/fw/cpu@1 (cpudrv0) online
pseudo-device: dld0
dld0 is /pseudo/dld@0
ISA-device: asy0
asy0 is /pci@0,0/isa@1f/asy@1,3f8
PCI Express-device: pci8086,3410@9, pcieb4
pcieb4 is /pci@0,0/pci8086,3410@9
PCI Express-device: pci8086,3a40@1c, pcieb5
pcieb5 is /pci@0,0/pci8086,3a40@1c
PCI Express-device: pci8086,3a48@1c,4, pcieb6
pcieb6 is /pci@0,0/pci8086,3a48@1c,4
PCI Express-device: pci8086,244e@1e, pci_pci0
pci_pci0 is /pci@0,0/pci8086,244e@1e
npe1 at root: space 78 offset 0
npe1 is /pci@78,0
PCI Express-device: pci8086,3420@0, pcieb7
pcieb7 is /pci@78,0/pci8086,3420@0
PCI Express-device: pci8086,340c@5, pcieb10
pcieb10 is /pci@78,0/pci8086,340c@5
PCI Express-device: pci108e,4848@1a,7, ehci0
ehci0 is /pci@0,0/pci108e,4848@1a,7
PCI Express-device: pci108e,4848@1d,7, ehci1
ehci1 is /pci@0,0/pci108e,4848@1d,7
PCI Express-device: pci108e,4848@1a, uhci0
uhci0 is /pci@0,0/pci108e,4848@1a
PCI Express-device: pci108e,4848@1a,1, uhci1
uhci1 is /pci@0,0/pci108e,4848@1a,1
PCI Express-device: pci108e,4848@1a,2, uhci2
uhci2 is /pci@0,0/pci108e,4848@1a,2
PCI Express-device: pci108e,4848@1d, uhci3
uhci3 is /pci@0,0/pci108e,4848@1d
PCI Express-device: pci108e,4848@1d,1, uhci4
uhci4 is /pci@0,0/pci108e,4848@1d,1
PCI Express-device: pci108e,4848@1d,2, uhci5
uhci5 is /pci@0,0/pci108e,4848@1d,2
cpu0: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu0: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
KPTI enabled (PCID not supported, INVPCID not supported)
initialized cpu module 'cpu.generic' on chip 0 core 1 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 1 strand 0
acpinex: cpu@2, cpudrv1
/fw/cpu@2 (cpudrv1) online
cpu1: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu1: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
cpu1 initialization complete - online
initialized cpu module 'cpu.generic' on chip 0 core 2 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 2 strand 0
acpinex: cpu@3, cpudrv2
/fw/cpu@3 (cpudrv2) online
pseudo-device: audio0
cpu2: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
audio0 is /pseudo/audio@0
cpu2: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
initialized cpu module 'cpu.generic' on chip 0 core 3 strand 0
cpu2 initialization complete - online
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 3 strand 0
acpinex: cpu@4, cpudrv3
/fw/cpu@4 (cpudrv3) online
cpu3: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu3: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
cpu3 initialization complete - online
initialized cpu module 'cpu.generic' on chip 0 core 8 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 8 strand 0
acpinex: cpu@5, cpudrv4
/fw/cpu@5 (cpudrv4) online
cpu4: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
USB 2.0 device (usb46b,ff01) operating at hi speed (USB 2.x) on USB 2.0 root hub: hub@3, hubd0 at bus address 2
cpu4: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
initialized cpu module 'cpu.generic' on chip 0 core 9 strand 0
cpu4 initialization complete - online
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 9 strand 0
American Megatrends Inc. Generic Hub
acpinex: cpu@6, cpudrv5
USB 2.0 device (usb4b4,6560) operating at hi speed (USB 2.x) on USB 2.0 root hub: hub@3, hubd1 at bus address 2
/fw/cpu@6 (cpudrv5) online
hubd0 is /pci@0,0/pci108e,4848@1a,7/hub@3
cpu5: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
hubd1 is /pci@0,0/pci108e,4848@1d,7/hub@3
cpu5: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
initialized cpu module 'cpu.generic' on chip 0 core 10 strand 0
cpu5 initialization complete - online
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 10 strand 0
/pci@0,0/pci108e,4848@1a,7/hub@3 (hubd0) online
acpinex: cpu@7, cpudrv6
/pci@0,0/pci108e,4848@1d,7/hub@3 (hubd1) online
/fw/cpu@7 (cpudrv6) online
USB 1.10 device (usb46b,ff10) operating at low speed (USB 1.x) on USB 1.10 root hub: device@1, usb_mid0 at bus address 2
cpu6: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
American Megatrends Inc. Virtual Keyboard and Mouse
cpu6: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
initialized cpu module 'cpu.generic' on chip 0 core 11 strand 0
cpu6 initialization complete - online
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 11 strand 0
usb_mid0 is /pci@0,0/pci108e,4848@1a,2/device@1
acpinex: cpu@8, cpudrv7
/fw/cpu@8 (cpudrv7) online
/pci@0,0/pci108e,4848@1a,2/device@1 (usb_mid0) online
cpu7: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
USB 2.0 device (usb430,a4a2) operating at hi speed (USB 2.x) on USB 2.0 external hub: communications@3, usbecm0 at bus address 3
cpu7: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz

panic[cpu8]/thread=ffffff01708e6c20: BAD TRAP: type=e (#pf Page fault) rp=ffffff01708e6790 addr=0 occurred in module "<unknown>" due to a NULL pointer dereference

#pf Page fault
Bad kernel fault at addr=0x0
pid=0, pc=0x0, sp=0xffffff01708e6880, eflags=0x10286
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe>  cr4: 6b8<xmme,fxsr,pge,pae,pse,de>
cr2: 0  cr3: e800000  cr8: 0

        rdi: ffffff4a3422d990 rsi:                0 rdx:                0
        rcx:                3  r8:          2060504  r9: ffffff01708e6820
        rax:                0 rbx: ffffff4a3422d990 rbp:                0
        r10: fffffffffb872f34 r11: ffffff4a6184adc0 r12:                0
        r13: ffffff4a57054e18 r14: fffffffff7cc7600 r15: ffffff01708e6900
        fsb:        200000000 gsb: ffffff4a61848000  ds:               38
         es:               38  fs:                0  gs:                0
        trp:                e err:               10 rip:                0
         cs:               30 rfl:            10286 rsp: ffffff01708e6880
         ss:               38

Warning - stack not written to the dump buffer
ffffff01708e6690 unix:die+c6 ()
ffffff01708e6780 unix:trap+11b1 ()
ffffff01708e6790 unix:cmntrap+e6 ()

panic: entering debugger (no dump device, continue to reboot)
Loaded modules: [ scsi_vhci mac uppc neti ufs s1394 apix uhci hook ip usba xhci specfs pcplusmp cpu.generic arp sockfs ]
kmdb: target stopped at:
kmdb_enter+0xb: movq   %rax,%rdi
[8]> ::cpuinfo
 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  0 fffffffffbc48000  1f    0    0  96   no    no t-22   fffffffffbc46040 
  1 ffffff4a612e4000  1f    0    0  -1   no    no t-17   ffffff017090ac20 (idle)
  2 ffffff4a612dc000  1f    0    0  -1   no    no t-10   ffffff0170a05c20 (idle)
  3 ffffff4a612d8000  1f    0    0  -1   no    no t-9    ffffff01709b1c20 (idle)
  4 ffffff4a614af000  1f    0    0  -1   no    no t-9    ffffff0170cebc20 (idle)
  5 ffffff4a6149f000  1f    0    0  -1   no    no t-9    ffffff01709abc20 (idle)
  6 ffffff4a61497000  1f    0    0  -1   no    no t-13   ffffff0170d86c20 (idle)
  7 ffffff4a616a6000  1f    0    0  -1   no    no t-3    ffffff0170debc20 (idle)
  8 fffffffffbc4dda0  1b    1    0  60  yes    no t-22   ffffff01708e6c20 
[8]> $C
fffffffffbc8e300 kmdb_enter+0xb()
fffffffffbc8e330 debug_enter+0x75(fffffffffb9633d0)
fffffffffbc8e410 panicsys+0x606(fffffffffb961a60, ffffff01708e65a8, fffffffffbc8e420, 1)
ffffff01708e6590 vpanic+0x15c()
ffffff01708e6600 0xfffffffffb8a0371()
ffffff01708e6690 die+0xc6(e, ffffff01708e6790, 0, 8)
ffffff01708e6780 trap+0x11b1(ffffff01708e6790, 0, 8)
ffffff01708e6790 cmntrap_pushed+0x3c()
[8]> 0xffffff01708e6790::print struct regs
{
    r_savfp = 0
    r_savpc = 0
    r_rdi = 0xffffff4a3422d990
    r_rsi = 0
    r_rdx = 0
    r_rcx = 0x3
    r_r8 = 0x2060504
    r_r9 = 0xffffff01708e6820
    r_rax = 0
    r_rbx = 0xffffff4a3422d990
    r_rbp = 0
    r_r10 = 0xfffffffffb872f34
    r_r11 = 0xffffff4a6184adc0
    r_r12 = 0
    r_r13 = 0xffffff4a57054e18
    r_r14 = 0xfffffffff7cc7600
    r_r15 = 0xffffff01708e6900
    __r_fsbase = 0x4000000048
    __r_gsbase = 0x100000000
    r_ds = 0x38
    r_es = 0x38
    r_fs = 0
    r_gs = 0                          
    r_trapno = 0xe
    r_err = 0x10
    r_rip = 0
    r_cs = 0x30
    r_rfl = 0x10286
    r_rsp = 0xffffff01708e6880
    r_ss = 0x38
}
[8]> 0xffffff4a3422d990::whatis
ffffff4a3422d990 is ffffff4a3422d000+990, allocated from kmem_alloc_12288
[8]> 0x3::whatis
3 is unknown
[8]> 0x2060504::whatis
2060504 is unknown
[8]> 0xffffff01708e6820::whatis
ffffff01708e6820 is in thread ffffff01708e6c20's stack
[8]> 0xffffff4a3422d990::whatis
ffffff4a3422d990 is ffffff4a3422d000+990, allocated from kmem_alloc_12288
[8]> 0xfffffffffb872f34::whatis
fffffffffb872f34 is bzero+0x174, in unix's text segment
[8]> 0xffffff4a6184adc0::whatis
ffffff4a6184adc0 is freed from kmem_alloc_1152
[8]> 0xffffff4a57054e18::whatis
ffffff4a57054e18 is allocated from kmem_alloc_112
[8]> 0xfffffffff7cc7600::whatis
fffffffff7cc7600 is intel_nhmex`check_memdevice, in intel_nhmex's text segment
[8]> 0xffffff01708e6900::whatis
ffffff01708e6900 is in thread ffffff01708e6c20's stack
[8]> 0x4000000048::whatis
4000000048 is unknown
[8]> 0xffffff01708e6880::whatis
ffffff01708e6880 is in thread ffffff01708e6c20's stack
[8]> 
[8]> 0xffffff01708e6c20::findstack -v
stack pointer for thread ffffff01708e6c20 (attach_drivers()): ffffff01708e68f0
  ffffff01708e6920 intel_nhmex`nhmex_smbios+0xeb()
  ffffff01708e69b0 intel_nhmex`inhmex_mc_attach+0x31b()
  ffffff01708e6a30 devi_attach+0xa1(ffffff4a570102b8, 0)
  ffffff01708e6a70 attach_node+0x8b(ffffff4a570102b8)
  ffffff01708e6ac0 i_ndi_config_node+0x95(ffffff4a570102b8, 6, 0)
  ffffff01708e6af0 i_ddi_attachchild+0x3a(ffffff4a570102b8)
  ffffff01708e6b40 i_ddi_attach_node_hierarchy+0x8a(ffffff4a570102b8)
  ffffff01708e6b90 attach_driver_nodes+0x90(67)
  ffffff01708e6be0 ddi_hold_installed_driver+0x138(67)
  ffffff01708e6c00 attach_drivers+0x3c()
  ffffff01708e6c10 thread_start+0xb()
[8]> 
[8]> ffffff01708e6920::dis
0xffffff01708e6920:             movb   $0x69,%al
0xffffff01708e6922:             movw   0x1(%rax),<reserved>
0xffffff01708e6925:             ***ERROR--unknown op code***
0xffffff01708e6927:             lcall  *0x68(%rbx)
0xffffff01708e692a:             int    $0x3
0xffffff01708e692b:             idivl  %edi
0xffffff01708e692d:             ***ERROR--unknown op code***
0xffffff01708e692f:             ljmp   *0x63(%rbp)
0xffffff01708e6932:             subl   $0x65746e69,%eax
0xffffff01708e6937:             insb   
0xffffff01708e6938:             subl   $0x353532,%eax
[8]> nhmex_smbios+0xeb::dis
intel_nhmex`nhmex_smbios+0xc5:  je     +0x3     <intel_nhmex`nhmex_smbios+0xca>
intel_nhmex`nhmex_smbios+0xc7:  incl   0xffffffffffffffe0(%rbp)
intel_nhmex`nhmex_smbios+0xca:  addq   $0x8,%r8
intel_nhmex`nhmex_smbios+0xce:  incl   %ecx
intel_nhmex`nhmex_smbios+0xd0:  cmpl   %edx,%ecx
intel_nhmex`nhmex_smbios+0xd2:  jle    -0x13    <intel_nhmex`nhmex_smbios+0xc1>
intel_nhmex`nhmex_smbios+0xd4:  movq   +0x3f84a1d(%rip),%rdi    <ksmbios>
intel_nhmex`nhmex_smbios+0xdb:  leaq   -0x13a(%rip),%rsi        <intel_nhmex`check_memdevice>
intel_nhmex`nhmex_smbios+0xe2:  leaq   -0x20(%rbp),%rdx
intel_nhmex`nhmex_smbios+0xe6:  call   +0x3ba5ead       <smbios_iter>
intel_nhmex`nhmex_smbios+0xeb:  movq   +0x3f84a06(%rip),%rdi    <ksmbios>
intel_nhmex`nhmex_smbios+0xf2:  leaq   -0x291(%rip),%rsi        <intel_nhmex`dimm_label>
intel_nhmex`nhmex_smbios+0xf9:  leaq   -0x20(%rbp),%rdx
intel_nhmex`nhmex_smbios+0xfd:  call   +0x3ba5e96       <smbios_iter>
intel_nhmex`nhmex_smbios+0x102: call   -0x5df   <intel_nhmex`check_serial_number>
intel_nhmex`nhmex_smbios+0x107: leave  
intel_nhmex`nhmex_smbios+0x108: ret    
0xfffffffff7cc7761:             nop    
0xfffffffff7cc7765:             nop    
0xfffffffff7cc7769:             nop    
0xfffffffff7cc776d:             nop    
[8]> 

#4

Updated by Olaf Bohlen 2 months ago

Workaround:

it works with -B disable-intel_nhmex=true
set on boot

#5

Updated by Robert Mustacchi 2 months ago

The intel_nhmex driver has a cousin the intel_nhm driver. Both of these, notably, use smbios to try and get information about the system. The smbios structures evolve over time which is why the smbios routines allow a user to specify a version of the library that they use. This allows us to deal with the fact that the smios flags will change over time. New data is added to the smbios tables and users of the new library will get up to date versions. However, great pain is taken in the library to make sure that we don't break older revisions.

The kernel provides access to smbios, but this isn't part of the DDI. As a result, there is a shared global kernel version of it called ksmbios that uses the version SMB_VERSION to determine which version is in use. Based on the symbol table of intel_nhmex I mostly surmise that it behaves in a similar version to intel_nhm which follows the behavior of other memory controller drivers that use smbios. As part of this, the driver calls smbios_info_memdevice. This would be all well and good, but as part of changes in both 12133 and 10395 which were updates for SMBIOS 3.2 and 3.3, this changed. Normally this would be fine, but this is a closed module which doesn't support anything here.

There is an out, in that we should be able to do actually verify this by disabling the smbios support in the driver We can test that in kmdb and if correct, update the driver.conf file. Thankfully this driver is not load bearing, though it is useful in being able to allow us to enumerate memory faults. Though it's not clear to me how much is different from intel_nhm that we couldn't add logic to that to cover this platform.

Olaf: Would you mind giving this a try in kmdb? I'm mostly doing this by hand so I'm not sure if I'll have 100% of the syntax right. At the initial prompt, run ::bp intel_nhmex`_init this will cause us to break into the system when we load the module. When we hit that, then run nhmex_no_smbios/W 1. If that causes the system to boot just fine, then we can figure out how to deal with this. I suspect by setting the property no-smbios we can deal with this just fine.

#6

Updated by Olaf Bohlen 2 months ago

Hey Robert,

so I tried but your idea still ends in a panic:

Loading unix...
Loading /platform/i86pc/amd64/boot_archive...
Loading /platform/i86pc/amd64/boot_archive.hash...
Booting...�module /platform/i86pc/kernel/amd64/unix: text at [0xfffffffffb800000, 0xfffffffffb982bf9] data at 0xfffffffffbc00000
module /kernel/amd64/genunix: text at [0xfffffffffbcaa840, 0xfffffffffbf3bcf7] data at 0xfffffffffbf3bd00
Loading kmdb...
module /kernel/misc/amd64/kmdbmod: text at [0xfffffffffb982c00, 0xfffffffffba3e7c7] data at 0xfffffffffbfaf1c0
module /kernel/misc/amd64/ctf: text at [0xfffffffffba3e7e0, 0xfffffffffba4bc9f] data at 0xfffffffffbfe87a0

Welcome to kmdb
kmdb: unable to determine terminal type: assuming `vt100'
Loaded modules: [ unix krtld genunix ]
[0]> ::bp intel_nhmex`_init
[0]> :c
SunOS Release 5.11 Version illumos-87d7b64204 64-bit
Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved.
x86_feature: lgpg
x86_feature: tsc
x86_feature: msr
x86_feature: mtrr
x86_feature: pge
x86_feature: de
x86_feature: cmov
x86_feature: mmx
x86_feature: mca
x86_feature: pae
x86_feature: cv8
x86_feature: pat
x86_feature: sep
x86_feature: sse
x86_feature: sse2
x86_feature: htt
x86_feature: asysc
x86_feature: nx
x86_feature: sse3
x86_feature: cx16
x86_feature: cmp
x86_feature: tscp
x86_feature: mwait
x86_feature: cpuid
x86_feature: ssse3
x86_feature: sse4_1
x86_feature: sse4_2
x86_feature: clfsh
x86_feature: 64
x86_feature: vmx
x86_feature: x2apic
x86_feature: ibrs
x86_feature: ibpb
x86_feature: stibp
x86_feature: ssbd
x86_feature: flush_cmd
x86_feature: core_thermal
mem = 201317220K (0x2fff6d9000)
Using default device instance data
SMBIOS v2.6 loaded (10504 bytes)initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 0 strand 0
root nexus = i86pc
pseudo0 at root
pseudo0 is /pseudo
scsi_vhci0 at root
scsi_vhci0 is /scsi_vhci
Reading Intel IOMMU boot options
npe0 at root: space 0 offset 0
npe0 is /pci@0,0
PCI Express-device: isa@1f, isa0
pseudo-device: acpippm0
acpippm0 is /pseudo/acpippm@0
pseudo-device: ppm0
ppm0 is /pseudo/ppm@0
ramdisk0 at root
ramdisk0 is /ramdisk
root on /ramdisk:a fstype ufs
acpinex0 at root
acpinex0 is /fw
acpinex: cpu@1, cpudrv0
/fw/cpu@1 (cpudrv0) online
pseudo-device: dld0
dld0 is /pseudo/dld@0
ISA-device: asy0
asy0 is /pci@0,0/isa@1f/asy@1,3f8
PCI Express-device: pci8086,3410@9, pcieb4
pcieb4 is /pci@0,0/pci8086,3410@9
PCI Express-device: pci8086,3a40@1c, pcieb5
pcieb5 is /pci@0,0/pci8086,3a40@1c
PCI Express-device: pci8086,3a48@1c,4, pcieb6
pcieb6 is /pci@0,0/pci8086,3a48@1c,4
PCI Express-device: pci8086,244e@1e, pci_pci0
pci_pci0 is /pci@0,0/pci8086,244e@1e
npe1 at root: space 78 offset 0
npe1 is /pci@78,0
PCI Express-device: pci8086,3420@0, pcieb7
pcieb7 is /pci@78,0/pci8086,3420@0
PCI Express-device: pci8086,340c@5, pcieb10
pcieb10 is /pci@78,0/pci8086,340c@5
PCI Express-device: pci108e,4848@1a,7, ehci0
ehci0 is /pci@0,0/pci108e,4848@1a,7
PCI Express-device: pci108e,4848@1d,7, ehci1
ehci1 is /pci@0,0/pci108e,4848@1d,7
PCI Express-device: pci108e,4848@1a, uhci0
uhci0 is /pci@0,0/pci108e,4848@1a
PCI Express-device: pci108e,4848@1a,1, uhci1
uhci1 is /pci@0,0/pci108e,4848@1a,1
PCI Express-device: pci108e,4848@1a,2, uhci2
uhci2 is /pci@0,0/pci108e,4848@1a,2
PCI Express-device: pci108e,4848@1d, uhci3
uhci3 is /pci@0,0/pci108e,4848@1d
PCI Express-device: pci108e,4848@1d,1, uhci4
uhci4 is /pci@0,0/pci108e,4848@1d,1
PCI Express-device: pci108e,4848@1d,2, uhci5
uhci5 is /pci@0,0/pci108e,4848@1d,2
cpu0: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu0: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
KPTI enabled (PCID not supported, INVPCID not supported)
initialized cpu module 'cpu.generic' on chip 0 core 1 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 1 strand 0
acpinex: cpu@2, cpudrv1
/fw/cpu@2 (cpudrv1) online
cpu1: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu1: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
cpu1 initialization complete - online
initialized cpu module 'cpu.generic' on chip 0 core 2 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 2 strand 0
acpinex: cpu@3, cpudrv2
pseudo-device: audio0
/fw/cpu@3 (cpudrv2) online
audio0 is /pseudo/audio@0
cpu2: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu2: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
cpu2 initialization complete - online
initialized cpu module 'cpu.generic' on chip 0 core 3 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 3 strand 0
acpinex: cpu@4, cpudrv3
/fw/cpu@4 (cpudrv3) online
cpu3: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu3: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
cpu3 initialization complete - online
initialized cpu module 'cpu.generic' on chip 0 core 8 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 8 strand 0
acpinex: cpu@5, cpudrv4
/fw/cpu@5 (cpudrv4) online
cpu4: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu4: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
cpu4 initialization complete - online
initialized cpu module 'cpu.generic' on chip 0 core 9 strand 0
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 9 strand 0
USB 2.0 device (usb4b4,6560) operating at hi speed (USB 2.x) on USB 2.0 root hub: hub@3, hubd0 at bus address 2
acpinex: cpu@6, cpudrv5
hubd0 is /pci@0,0/pci108e,4848@1d,7/hLoaded modules: [ scsi_vhci mac uppc neti ufs s1394 apix uhci hook ip usba xhci specfs pcplusmp cpu.generic arp sockfs ]
ukmdb: stop at intel_nhmex`_init
kmdb: target stopped at:
intel_nhmex`_init:      pushq  %rbp
[4]> nhmex_no_smbios/W 1
intel_nhmex`nhmex_no_smbios:    0               =       0x1
[4]> :c
b@3
/fw/cpu@6 (cpudrv5) online
USB 1.10 device (usb46b,ff10) operating at low speed (USB 1.x) on USB 1.10 root hub: device@1, usb_mid0 at bus address 2
cpu5: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
WARNING: Device /fw/cpu@5 failed to power up.
cpu5: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
initialized cpu module 'cpu.generic' on chip 0 core 10 strand 0
cpu5 initialization complete - online
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 10 strand 0
/pci@0,0/pci108e,4848@1d,7/hub@3 (hubd0) online
acpinex: cpu@7, cpudrv6
American Megatrends Inc. Virtual Keyboard and Mouse
/fw/cpu@7 (cpudrv6) online
USB 2.0 device (usb46b,ff01) operating at hi speed (USB 2.x) on USB 2.0 root hub: hub@3, hubd1 at bus address 2
cpu6: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
WARNING: cpudrv_monitor: instance 4: can't raise CPU power level
cpu6: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
initialized cpu module 'cpu.generic' on chip 0 core 11 strand 0
cpu6 initialization complete - online
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 0 core 11 strand 0
usb_mid0 is /pci@0,0/pci108e,4848@1a,2/device@1
acpinex: cpu@8, cpudrv7
American Megatrends Inc. Generic Hub
/fw/cpu@8 (cpudrv7) online
USB 2.0 device (usb90c,1000) operating at hi speed (USB 2.x) on USB 2.0 external hub: storage@2, scsa2usb0 at bus address 3
cpu7: x86 (chipid 0x0 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
/pci@0,0/pci108e,4848@1a,2/device@1 (usb_mid0) online
cpu7: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
initialized cpu module 'cpu.generic' on chip 1 core 0 strand 0
cpu7 initialization complete - online
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 1 core 0 strand 0
hubd1 is /pci@0,0/pci108e,4848@1a,7/hub@3
acpinex: cpu@9, cpudrv8
General USB Flash Disk 04YTTSCFTYG3SG3L
/fw/cpu@9 (cpudrv8) online
USB 1.10 interface (usbif46b,ff10.config1.0) operating at low speed (USB 1.x) on USB 1.10 root hub: keyboard@0, hid0 at bus address 2
cpu8: x86 (chipid 0x1 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
/pci@0,0/pci108e,4848@1a,7/hub@3 (hubd1) online
cpu8: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
initialized cpu module 'cpu.generic' on chip 1 core 1 strand 0
cpu8 initialization complete - online
initialized model-specific module 'cpu_ms.GenuineIntel.6.46' on chip 1 core 1 strand 0
scsa2usb0 is /pci@0,0/pci108e,4848@1d,7/hub@3/storage@2
acpinex: cpu@10, cpudrv9
American Megatrends Inc. Virtual Keyboard and Mouse
/fw/cpu@10 (cpudrv9) online
/pci@0,0/pci108e,4848@1d,7/hub@3/storage@2 (scsa2usb0) online
cpu9: x86 (chipid 0x1 GenuineIntel 206E6 family 6 model 46 step 6 clock 2000 MHz)
cpu9: Intel(r) Xeon(r) CPU           X7550  @ 2.00GHz
initialized cpu module 'cpu.generic' o
panic[cpu5]/thread=ffffff01708e6c20: BAD TRAP: type=e (#pf Page fault) rp=ffffff01708e6790 addr=0 occurred in module "<unknown>" due to a NULL pointer dereference

#pf Page fault
Bad kernel fault at addr=0x0
pid=0, pc=0x0, sp=0xffffff01708e6880, eflags=0x10286
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe>  cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: 0  cr3: e800000  cr8: 0

        rdi: ffffff4a3422d990 rsi:                0 rdx:                0
        rcx:                3  r8:          2060504  r9: ffffff01708e6820
        rax:                0 rbx: ffffff4a3422d990 rbp:                0
        r10: fffffffffb872f34 r11: ffffff4a614ba940 r12:                0
        r13: ffffff4a57053e18 r14: fffffffff7cb5600 r15: ffffff01708e6900
        fsb:        200000000 gsb: ffffff4a614c8000  ds:               38
         es:               38  fs:                0  gs:                0
        trp:                e err:               10 rip:                0
         cs:               30 rfl:            10286 rsp: ffffff01708e6880
         ss:               38

Warning - stack not written to the dump buffer
ffffff01708e6690 unix:die+c6 ()
ffffff01708e6780 unix:trap+11b1 ()
ffffff01708e6790 unix:cmntrap+e6 ()

panic: entering debugger (no dump device, continue to reboot)
kmdb: target stopped at:
kmdb_enter+0xb: movq   %rax,%rdi
[5]> ::cpuinfo
 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  0 fffffffffbc48000  1f    0    0  96   no    no t-10   fffffffffbc46040 
  1 ffffff4a612e4000  1f    0    0  99   no    no t-2    ffffff0170e8fc20 
  2 ffffff4a612d8000  1f    0    0  -1   no    no t-3    ffffff0170996c20 (idle)
  3 ffffff4a612c8000  1f    0    0  -1   no    no t-6    ffffff0170ac4c20 (idle)
  4 ffffff4a614cc000  1f    0    0  -1   no    no t-7    ffffff0170d62c20 (idle)
  5 fffffffffbc4dda0  1b    0    0  60   no    no t-10   ffffff01708e6c20 
  6 ffffff4a616a5000  1f    0    0  -1   no    no t-7    ffffff0171014c20 (idle)
  7 ffffff4a6169d000  1f    0    0  -1   no    no t-2    ffffff0170d5cc20 (idle)
  8 ffffff4a614bc000  1f    0    0   0   no    no t-0    ffffff01704d5c20 
  9 ffffff4a6188e000  1f    0    0  -1   no    no t-0    ffffff0170d36c20 (idle)
 10 ffffff4a61880000  1f    0    0  99   no    no t-4    ffffff0170bd0c20 
[5]> 
[5]> $C   
fffffffffbc8e300 kmdb_enter+0xb()
fffffffffbc8e330 debug_enter+0x75(fffffffffb9633d0)
fffffffffbc8e410 panicsys+0x606(fffffffffb961a60, ffffff01708e65a8, fffffffffbc8e420, 1)
ffffff01708e6590 vpanic+0x15c()
ffffff01708e6600 0xfffffffffb8a0371()
ffffff01708e6690 die+0xc6(e, ffffff01708e6790, 0, 5)
ffffff01708e6780 trap+0x11b1(ffffff01708e6790, 0, 5)
ffffff01708e6790 cmntrap_pushed+0x3c()


We die here again at the same address but 2 cpu loads later...

Also available in: Atom PDF