Project

General

Profile

Bug #9758

Double fault when booting new Ryzen system

Added by Gary Mills about 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
kernel
Start date:
2018-08-20
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

Here we go again, but this is with the new Raven Ridge series of Ryzen CPUs. The CPU is a Ryzen 3 2200G with built-in graphics, although I'm not using the internal graphics at present. This system has an ASUS Prime B350M-A/CSM motherboard and 8 gigs of memory. It boots with the loader. This is part of what I get on a serial console when booting OI-hipster-gui-20180427:

startup.c:1793: startup_modules: calling prom_setup...^M^M
startup.c:1795: startup_modules: done^M^M
startup.c:1800: startup_modules: calling psm_modload...^M^M
startup.c:1803: startup_modules() done^M^M
startup.c:2263: startup_end() starting...^M^M
startup.c:2306: Calling configure()...^M^M
^M^M
^M^Mpanic[cpu0]/thread=0: BAD TRAP: type=8 (#df Double fault) rp=fffffffffbc43bb0 addr=0^M^M
^M^M
#df Double fault^M^M
pid=0, pc=0xfffffffffb8639a0, sp=0xf000eef3f000e2c3, eflags=0x10082^M^M
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>^M^M
cr2: 9f07000cr3: e000000cr8: 0^M^M
^M^M
        rdi:                0 rsi:                0 rdx:                0^M^M
        rcx:                0  r8: fffffffffbc7abc0  r9:                0^M^M
        rax: f000eef3f000e2c3 rbx: fffffffffbc39780 rbp: fffffffffbc7a160^M^M
        r10:                0 r11: fffffffffb82c130 r12: fffffffffbc7a1d0^M^M
        r13:         fbc39780 r14:                0 r15:                0^M^M
        fsb:        200000000 gsb: fffffffffbc39780  ds:                0^M^M
         es:                0  fs:                0  gs:                0^M^M
        trp:                8 err:                0 rip: fffffffffb8639a0^M^M
         cs:               30 rfl:            10082 rsp: f000eef3f000e2c3^M^M
         ss:               38^M^M
tss.tss_rsp0:   0xfffffffffbc7ab78^M^M
tss.tss_rsp1:   0x0^M^M
tss.tss_rsp2:   0x0^M^M
tss.tss_ist1:   0xfffffffffbc43ca0^M^M
tss.tss_ist2:   0x0^M^M
tss.tss_ist3:   0x0^M^M
tss.tss_ist4:   0x0^M^M
tss.tss_ist5:   0x0^M^M
tss.tss_ist6:   0x0^M^M
tss.tss_ist7:   0x0^M^M
^M^M
Warning - stack not written to the dump buffer^M^M
fffffffffbc43a90 unix:die+89 ()^M^M
fffffffffbc43ba0 unix:trap+c95 ()^M^M
fffffffffbc7a160 unix:_patch_xrstorq_rbx+18f ()^M^M
fffffffffbc7a1c0 unix:do_interrupt+a8 ()^M^M
fffffffffbc7a1d0 unix:_interrupt+ba ()^M^M
fffffffffbc7a2c0 unix:cmntrap+e1 ()^M^M
fffffffffbc7a3e0 unix:cf_check_compressed+64 ()^M^M
fffffffffbc7a410 unix:bufs_open+104 ()^M^M
fffffffffbc7a470 unix:BRD_OPEN+ae ()^M^M
fffffffffbc7a4a0 unix:kobj_boot_open+35 ()^M^M
fffffffffbc7a510 unix:kobj_open+102 ()^M^M
fffffffffbc7a570 unix:kobj_open_file+20 ()^M^M
fffffffffbc7a5f0 unix:kobj_open_path+112 ()^M^M
fffffffffbc7a650 unix:kobj_load_module+6c ()^M^M
fffffffffbc7a6d0 genunix:mod_load+23b ()^M^M
fffffffffbc7a730 genunix:mod_hold_installed_mod+6d ()^M^M
fffffffffbc7a7b0 genunix:modrload+dd ()^M^M
fffffffffbc7a7d0 genunix:modload+17 ()^M^M
fffffffffbc7a820 genunix:mod_hold_dev_by_major+bf ()^M^M
fffffffffbc7a850 genunix:ndi_hold_driver+2b ()^M^M
fffffffffbc7a880 genunix:probe_node+1c ()^M^M
fffffffffbc7a8d0 genunix:i_ndi_config_node+d0 ()^M^M
fffffffffbc7a900 genunix:i_ddi_attachchild+48 ()^M^M
fffffffffbc7a950 genunix:i_ddi_attach_node_hierarchy+6c ()^M^M
fffffffffbc7a990 genunix:attach_driver_nodes+b2 ()^M^M
fffffffffbc7a9d0 genunix:i_ddi_attach_pseudo_node+2f ()^M^M
fffffffffbc7a9f0 genunix:i_ddi_init_root+195 ()^M^M
fffffffffbc7aa00 unix:configure+24 ()^M^M
fffffffffbc7aa20 unix:startup_end+7e ()^M^M
fffffffffbc7aa30 unix:startup+5a ()^M^M
fffffffffbc7aa70 genunix:main+3b ()^M^M
fffffffffbc7aa80 unix:_locore_start+90 ()^M^M
^M^M
^M^M
panic: entering debugger (no dump device, continue to reboot)^M^M

Welcome to kmdb
kmdb: unable to determine terminal type: assuming `vt100'

History

#1

Updated by Toomas Soome about 2 years ago

Gary Mills wrote:

Here we go again, but this is with the new Raven Ridge series of Ryzen CPUs. The CPU is a Ryzen 3 2200G with built-in graphics, although I'm not using the internal graphics at present. This system has an ASUS Prime B350M-A/CSM motherboard and 8 gigs of memory. It boots with the loader. This is part of what I get on a serial console when booting OI-hipster-gui-20180427:

[...]

Unless i got it wrong, it seems that the cmntrap() is called at line: http://src.illumos.org/source/xref/illumos-gate/usr/src/common/fs/decompress.c#130

It means that we should check is filep->fi_memp is valid pointer there - it can be that we do receive it 0 from openi().

Now, after reading the code more, at that early we are supposed to have fi_memp to have pointer to file data in ramdisk (boot archive), if we do get fault there, is it possible that we either still get BA to be stored in memory hole or if we get it stored in some otherwise unusable location. To verify, you can boot with -B prom_debug=true,kbm_debug=true then record the address and size for boot archive, and check against smap (loader ok prompt).

#2

Updated by Gary Mills about 2 years ago

The command I used was:

boot -k -B prom_debug=true,kbm_debug=true,map_debug=true

I do have a transcript that contains lots of debugging information. Shall I send it to you?

My understanding of a double fault is that a second fault occurs while handling the first one. It could happen, for example, within an interrupt service routine if it contains an instruction that causes a fault on the new AMD Ryzen CPU.

#3

Updated by Toomas Soome about 2 years ago

Gary Mills wrote:

The command I used was:

[...]

I do have a transcript that contains lots of debugging information. Shall I send it to you?

My understanding of a double fault is that a second fault occurs while handling the first one. It could happen, for example, within an interrupt service routine if it contains an instruction that causes a fault on the new AMD Ryzen CPU.

Yea, well, I do not have AMD at all, but I can try to dig through and see what I can find:) since in general we do have bootable systems, it has to be related to memory maps but I guess, we shall see.

#4

Updated by Toomas Soome about 2 years ago

Toomas Soome wrote:

Gary Mills wrote:

The command I used was:

[...]

I do have a transcript that contains lots of debugging information. Shall I send it to you?

My understanding of a double fault is that a second fault occurs while handling the first one. It could happen, for example, within an interrupt service routine if it contains an instruction that causes a fault on the new AMD Ryzen CPU.

Yea, well, I do not have AMD at all, but I can try to dig through and see what I can find:) since in general we do have bootable systems, it has to be related to memory maps but I guess, we shall see.

So far it seems to be duplicate of #8681 and the question is, why the installation/live media is missing the fix.

#5

Updated by Toomas Soome about 2 years ago

  • Status changed from New to Closed

Duplicate of 8681, was caused by using older usb image by mistake.

Also available in: Atom PDF