Project

General

Profile

Actions

Bug #14994

open

Panic on microcode update with UFS boot archive

Added by Andy Fiddaman 6 days ago. Updated 4 days ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
kernel
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

Early in system boot, before things like kmem are set up, and assuming there is an update available, the microcode in the boot CPU is updated.
To do this, the kernel accesses files in the boot archive via kobj_open() and, this early in boot, that hands off to a bootfs module via BRD_OPEN().

The UFS boot module maintains a cache of allocated inodes in a linked list and the boot CPU microcode update ends up putting entries in this list which are allocated from boot pages (via BOP_ALLOC) resulting in a list that contains invalid pointers later, after kmem is initialised.

This manifests as a crash in ufsops`get_cache when loading additional modules later:

panic[cpu0]/thread=fffffffffbe58020: BAD TRAP: type=e (#pf Page fault) rp=fffffffffbe94d10 addr=f occurred in module "unix" due to a NULL pointer dereference

#pf Page fault
Bad kernel fault at addr=0xf
pid=0, pc=0xfffffffffbce86a0, sp=0xfffffffffbe94e00, eflags=0x10286
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe>  cr4: 360668<smap,smep,osxsav,pcide,xmme,fxsr,mce,pae,de>
cr2: f  cr3: 7fff1000  cr8: 0

        rdi:                2 rsi:                2 rdx: fffffffffbe94e80
        rcx: 3436646d612f7672  r8:             756d  r9:                0
        rax: ffffffffffffffff rbx: fffffd02f0406000 rbp: fffffffffbe94e10
        r10: 6d6f695f646d612f r11:                0 r12:                2
        r13:          5200000 r14:                0 r15: fffffffffbe1b940
        fsb:        200000000 gsb: fffffffffbe5a000  ds:                0
         es:                0  fs:                0  gs:                0
        trp:                e err:                0 rip: fffffffffbce86a0
         cs:               30 rfl:            10286 rsp: fffffffffbe94e00
         ss:               38

CPU          ADDRESS    TIMESTAMP TYPE  VC HANDLER          PC
  0 fffffffffbe26a98   d207a39574 trap   e #pf              get_cache+20
  0 fffffffffbe26920   a4f8aca27c trap   d #gp              wrmsr+b
Warning - stack not written to the dump buffer
fffffffffbe94c10 unix:die+d0 ()
fffffffffbe94d00 unix:trap+12b7 ()
fffffffffbe94d10 unix:_cmntrap+1cd ()
fffffffffbe94e10 unix:get_cache+20 ()
fffffffffbe94e50 unix:openi+31 ()
fffffffffbe952c0 unix:find+1d4 ()
fffffffffbe952f0 unix:bufs_open+cb ()
fffffffffbe95350 unix:BRD_OPEN+b3 ()
fffffffffbe95380 unix:kobj_boot_open+35 ()
fffffffffbe953f0 unix:kobj_open+122 ()
fffffffffbe95450 unix:kobj_open_file+27 ()
fffffffffbe954c0 unix:kobj_open_path+e3 ()
fffffffffbe95520 unix:kobj_load_module+61 ()
fffffffffbe955b0 genunix:mod_load+2e3 ()
fffffffffbe95610 genunix:mod_hold_installed_mod+6b ()
fffffffffbe95690 genunix:modrload+1a6 ()
fffffffffbe956b0 genunix:modload+d ()
fffffffffbe956e0 unix:startup_end+1a2 ()
fffffffffbe956f0 unix:startup+75 ()
fffffffffbe95730 genunix:main+96 ()
fffffffffbe95740 unix:_locore_start+88 ()

and inspecting the cache list at this stage shows that we're indeed walking off the end

[0]> icache::list struct cache next
fffffffffbe55ff0
fffffd02eeb27ea0
...
fffffd020cc86f68
15c1d000
15c17000
15c11000
ffffffffffffffff
kmdb: failed to read next pointer from object ffffffffffffffff: no mapping for address

[0]> 15c11000::print struct cache
{
    next = 0xffffffffffffffff
    data = 0xffffffffffffffff
    key = 0xffffffff
    size = 0xffffffff
}

The three addresses there that look like 15c1n000 are allocations that were made from the boot pages, and are no longer valid.

Actions #1

Updated by Electric Monk 4 days ago

  • Gerrit CR set to 2386
Actions

Also available in: Atom PDF