Project

General

Profile

Bug #9797

dump_ttrace() chokes on softints

Added by Hans Rosenfeld over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Category:
-
Start date:
2018-09-04
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

During recent bhyve testing I ran into the following panic:

panic[cpu28]/thread=fffffeb3e41fd7e0:
BAD TRAP: type=d (#gp General protection) rp=fffffe00fa9ab180 addr=fffffeb2ec745000

bhyve:
#gp General protection
addr=0xfffffeb2ec745000
pid=111134, pc=0xfffffffffb8849e8, sp=0xfffffe00fa9ab278, eflags=0x10246
cr0: 80050033<pg,wp,ne,et,mp,pe>  cr4: 3626f8<smap,smep,osxsav,pcide,vmxe,xmme,fxsr,pge,mce,pae,pse,de>
cr2: 0
cr3: 105996b000
cr8: 0

        rdi: fffffec19f3c1e40 rsi:                0 rdx:                0
        rcx:               1c  r8: fffffec1c676b800  r9:         c0000081
        rax:              2ff rbx: fffffeb2eefced90 rbp: fffffe00fa9ab2a0
        r10:         c0000083 r11:         c0000082 r12: fffffeb3e41fd7e0
        r13: fffffffffb88a408 r14: fffffeb2ec7c3000 r15: fffffeb2ec745000
        fsb:                0 gsb: fffffeb2ec745000  ds:               38
         es:               38  fs:                0  gs:                0
        trp:                d err:                0 rip: fffffffffb8849e8
         cs:               30 rfl:            10246 rsp: fffffe00fa9ab278
         ss:               38

CPU          ADDRESS    TIMESTAMP TYPE  VC HANDLER          PC
  0 fffffffffbc394a0 22d87b148f23 intr  f1 xc_serv          mutex_delay_default+7
  0 fffffffffbc39328 22d87b1471fd intr  f1 xc_serv          tsc_read+3
  0 fffffffffbc391b0 22d87b146155 intr  f1 xc_serv          mutex_delay_default+7
  0 fffffffffbc39038 22d87b145625 intr  f1 xc_serv          tsc_read+3
  0 fffffffffbc38ec0 22d87b1448f5 intr  f1 xc_serv          mutex_delay_default+7
  0 fffffffffbc38d48 22d87b1435f5 intr  f1 xc_serv          mutex_delay_default+7
  0 fffffffffbc38bd0 22d87b1432f7 intr  f1 xc_serv          mutex_delay_default+7
  0 fffffffffbc38a58 22d87b141d1e intr  f1 xc_serv          mutex_delay_default+7
  0 fffffffffbc388e0 22d87b140ef0 intr  f1 xc_serv          default_lock_delay+54
  0 fffffffffbc38768 22d87b13fdb6 intr  f1 xc_serv          mutex_delay_default+7
  1 fffffeb2eb8c0658 22d87b148f8e intr  f1 xc_serv          mutex_enter+10
  1 fffffeb2eb8c04e0 22d87b147276 intr  f1 xc_serv          mutex_delay_default+7
  1 fffffeb2eb8c0368 22d87b1461b4 intr  f1 xc_serv          sysdc_setrun+36
  1 fffffeb2eb8c01f0 22d87b145675 intr  f1 xc_serv          hat_getpfnum+4d
  1 fffffeb2eb8c0078 22d87b14496a intr  f1 xc_serv          ddi_dma_allochdl+6b
  1 fffffeb2eb8bff00 22d87b13fe70 intr  f1 xc_serv          i86_mwait+d
  1 fffffeb2eb8bfd88 22d87b091fd9 intr  f6 cbe_fire         i86_mwait+d
  1 fffffeb2eb8bfc10 22d87b0439ea intr  f6 cbe_fire         i86_mwait+d

panic[cpu28]/thread=fffffeb3e41fd7e0:
BAD TRAP: type=e (#pf Page fault) rp=fffffffffbca0140 addr=20 occurred in module "apix" due to a NULL pointer dereference

This bug is about the 2nd NULL pointer dereference, which happens in apix.

Here is what happens: dump_ttrace() finds a trap_trace_rec_t for an interrupt with vector 0x50fd (softint), but doesn't handle it accordingly. Instead it calls (*get_intr_handler)(), which is actually apix_get_intr_handler() on this vector. This was at first really confusing, because apix_get_intr_handler() looks like this:

#define APIX_NVECTOR 256
[...]
static uintptr_t
apix_get_intr_handler(int cpu, short vec)
{
    apix_vector_t *apix_vector;

    ASSERT(cpu < apic_nproc && vec < APIX_NVECTOR);
    if (cpu >= apic_nproc)
        return (NULL);

    apix_vector = apixs[cpu]->x_vectbl[vec];

    return ((uintptr_t)(apix_vector->v_autovect));
}

The assertion was ignored because we were already processing a panic, and so apix_get_intr_handler() accessed memory way beyond the end of x_vectbl, found a NULL pointer, and dereferenced it. So apix_get_intr_handler() needs to be more robust, and dump_ttrace() needs to deal with softints like mdb does in ttrace_interrupt().

#1

Updated by Hans Rosenfeld over 2 years ago

Testing: I caused a panic by setting a register about to be dereferenced to NULL and observed the resulting panic message. The trap trace is now printed correctly, and softints look like this:

 25 fffffeb2ec641758 107f2c0bf474 intr   - (fakesoftint)    fakesoftint+23

That is similar to what ::ttrace would print, except that it doesn't recognize the PC as being fakesoftint_return. I consider this a minor cosmetic issue that is just too much trouble to fix now.

#2

Updated by Electric Monk over 2 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 627227313bb67bd9c7becd366732842c63504f33

commit  627227313bb67bd9c7becd366732842c63504f33
Author: Hans Rosenfeld <hans.rosenfeld@joyent.com>
Date:   2018-09-26T19:30:14.000Z

    9797 dump_ttrace() chokes on softints
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
    Reviewed by: John Levon <levon@movementarian.org>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Yuri Pankov <yuripv@yuripv.net>
    Reviewed by: Andy Fiddaman <omnios@citrus-it.net>
    Approved by: Garrett D'Amore <garrett@damore.org>

Also available in: Atom PDF