Bug #8570

ixgbe crash with rings

Added by Igor Kozhukhov 2 months ago. Updated 2 months ago.

Status:NewStart date:2017-08-09
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Difficulty:Medium Tags:needs-triage

Description

i have setup:
ixgbe1 to 10G switch, up vlan like:
dladm create-vlan -v 400 -l ixgbe1 subnet40
and configure nic 'subnet40' for ethernet frames layer2 - no IP on it.
i have core dump after some traffic

panic[cpu4]/thread=ffffff003df43c40: 
BAD TRAP: type=e (#pf Page fault) rp=ffffff003df43990 addr=18 occurred in module "ixgbe" due to a NULL pointer dereference

sched: 
#pf Page fault
Bad kernel fault at addr=0x18
pid=0, pc=0xfffffffff81c9bb6, sp=0xffffff003df43a80, eflags=0x10246
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 3406f8<smap,smep,osxsav,xmme,fxsr,pge,mce,pae,pse,de>
cr2: 18
cr3: 16a68f000
cr8: 0

        rdi: ffffff0d0d43f0a0 rsi:                0 rdx:             3a38
        rcx:               20  r8: ffffff0cef852180  r9:                1
        rax:                0 rbx: ffffff0d0d508040 rbp: ffffff003df43b10
        r10:               1a r11:                4 r12:           80000b
        r13:                4 r14: ffffff0cffb34280 r15: ffffff0d002a0000
        fsb: fffffd7fff162a40 gsb: ffffff0d00083040  ds:                0
         es:                0  fs:                0  gs:                0
        trp:                e err:                0 rip: fffffffff81c9bb6
         cs:               30 rfl:            10246 rsp: ffffff003df43a80
         ss:               38

ffffff003df43890 unix:real_mode_stop_cpu_stage2_end+ad84 ()
ffffff003df43980 unix:trap+12df ()
ffffff003df43990 unix:_cmntrap+e6 ()
ffffff003df43b10 ixgbe:ixgbe_ring_rx+1e6 ()
ffffff003df43b50 ixgbe:ixgbe_intr_rx_work+34 ()
ffffff003df43b90 ixgbe:ixgbe_intr_msix+55 ()
ffffff003df43be0 apix:apix_dispatch_by_vector+8a ()
ffffff003df43c20 apix:apix_dispatch_lowlevel+1c ()
ffffff003faa69c0 unix:switch_sp_and_call+13 ()
ffffff003faa6a20 apix:apix_do_interrupt+398 ()
ffffff003faa6a30 unix:cmnint+ba ()
ffffff003faa6b40 unix:kcopy+22 ()
ffffff003faa6bb0 genunix:uiomove+dd ()
ffffff003faa6ca0 genunix:vpm_data_copy+eb ()
ffffff003faa6d30 specfs:spec_read+114 ()
ffffff003faa6dd0 genunix:fop_read+fd ()
ffffff003faa6f00 genunix:pread+1c2 ()
ffffff003faa6f10 unix:brand_sys_syscall+21a ()

dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel + curproc
NOTICE: ahci0: ahci_tran_reset_dport port 4 reset port
root@dev6:/var/crash/myhost# mdb *.10
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci zfs sata sd ip hook neti sockfs arp usba xhci uhci stmf stmf_sbd qlc pmcs mr_sas mpt_sas mpt mm random crypto ptm ufs logindmux smbsrv nfs nsmb ]
> $C
ffffff003df43b10 ixgbe_ring_rx+0x1e6(ffffff0cef852180, ffffffff)
ffffff003df43b50 ixgbe_intr_rx_work+0x34(ffffff0cef852180)
ffffff003df43b90 ixgbe_intr_msix+0x55(ffffff0d002a0918, 0)
ffffff003df43be0 apix_dispatch_by_vector+0x8a(21)
ffffff003df43c20 apix_dispatch_lowlevel+0x1c(21, 0)
ffffff003faa69c0 switch_sp_and_call+0x13()
ffffff003faa6a20 apix_do_interrupt+0x398(ffffff003faa6a30, 5043d0)
ffffff003faa6a30 _interrupt+0xba()
ffffff003faa6b40 kcopy+0x22()
ffffff003faa6bb0 uiomove+0xdd(fffffe0171b90000, 1000, 0, ffffff003faa6e28)
ffffff003faa6ca0 vpm_data_copy+0xeb(ffffff0d02822a80, 1647c000, 2000, ffffff003faa6e28, 1, 0, ffffffff00000000, ffffff0e00000001)
ffffff003faa6d30 spec_read+0x114(ffffff0dab89eb80, ffffff003faa6e28, 0, ffffff0d2b1dca28, 0)
ffffff003faa6dd0 fop_read+0xfd(ffffff0dab89eb80, ffffff003faa6e28, 0, ffffff0d2b1dca28, 0)
ffffff003faa6f00 pread+0x1c2(5, 42d3d0, 100000, 163a6000)
ffffff003faa6f10 sys_syscall+0x19f()
> ixgbe_ring_rx+0x1e6::dis
ixgbe_ring_rx+0x1c0:            incl   0x28(%r8)
ixgbe_ring_rx+0x1c4:            leaq   0x0(,%r11,8),%rcx
ixgbe_ring_rx+0x1cc:            movl   0x1a0c(%r15),%esi
ixgbe_ring_rx+0x1d3:            movq   0x50(%r14),%rax
ixgbe_ring_rx+0x1d7:            testl  %esi,%esi
ixgbe_ring_rx+0x1d9:            movq   (%rax,%rcx),%rax
ixgbe_ring_rx+0x1dd:            je     +0x7     <ixgbe_ring_rx+0x1e6>
ixgbe_ring_rx+0x1df:            movl   0x60(%rax),%ecx
ixgbe_ring_rx+0x1e2:            testl  %ecx,%ecx
ixgbe_ring_rx+0x1e4:            jne    +0xf     <ixgbe_ring_rx+0x1f5>
ixgbe_ring_rx+0x1e6:            movq   0x18(%rax),%rax
ixgbe_ring_rx+0x1ea:            movq   $0x0,0x8(%rbx)
ixgbe_ring_rx+0x1f2:            movq   %rax,(%rbx)
ixgbe_ring_rx+0x1f5:            movl   0x6c(%r14),%ecx
ixgbe_ring_rx+0x1f9:            leal   0x1(%r13),%eax
ixgbe_ring_rx+0x1fd:            cmpl   %ecx,%eax
ixgbe_ring_rx+0x1ff:            jb     +0xcb    <ixgbe_ring_rx+0x2d0>
ixgbe_ring_rx+0x205:            movl   $0x1,%eax
ixgbe_ring_rx+0x20a:            incl   %r10d
ixgbe_ring_rx+0x20d:            subl   %ecx,%eax
ixgbe_ring_rx+0x20f:            addl   %eax,%r13d

History

#1 Updated by Igor Kozhukhov 2 months ago

  • Description updated (diff)

#2 Updated by Dan McDonald 2 months ago

I got a disassembly of where this is in Igor's code:

ixgbe_ring_rx+0x1c0:            incl   0x28(%r8)
ixgbe_ring_rx+0x1c4:            leaq   0x0(,%r11,8),%rcx
ixgbe_ring_rx+0x1cc:            movl   0x1a0c(%r15),%esi
ixgbe_ring_rx+0x1d3:            movq   0x50(%r14),%rax
ixgbe_ring_rx+0x1d7:            testl  %esi,%esi
ixgbe_ring_rx+0x1d9:            movq   (%rax,%rcx),%rax
ixgbe_ring_rx+0x1dd:            je     +0x7     <ixgbe_ring_rx+0x1e6>
ixgbe_ring_rx+0x1df:            movl   0x60(%rax),%ecx
ixgbe_ring_rx+0x1e2:            testl  %ecx,%ecx
ixgbe_ring_rx+0x1e4:            jne    +0xf     <ixgbe_ring_rx+0x1f5>
ixgbe_ring_rx+0x1e6:            movq   0x18(%rax),%rax  /* XXX KEBE SAYS PANIC IS HERE */
ixgbe_ring_rx+0x1ea:            movq   $0x0,0x8(%rbx)
ixgbe_ring_rx+0x1f2:            movq   %rax,(%rbx)
ixgbe_ring_rx+0x1f5:            movl   0x6c(%r14),%ecx
ixgbe_ring_rx+0x1f9:            leal   0x1(%r13),%eax
ixgbe_ring_rx+0x1fd:            cmpl   %ecx,%eax
ixgbe_ring_rx+0x1ff:            jb     +0xcb    <ixgbe_ring_rx+0x2d0>
ixgbe_ring_rx+0x205:            movl   $0x1,%eax
ixgbe_ring_rx+0x20a:            incl   %r10d
ixgbe_ring_rx+0x20d:            subl   %ecx,%eax
ixgbe_ring_rx+0x20f:            addl   %eax,%r13d

So the closest source I could find to match that is here:

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/io/ixgbe/ixgbe_rx.c#695

Either in one of two places where

    current_rbd->read.pkt_addr =
            current_rcb->rx_buf.dma_address;
    current_rbd->read.hdr_addr = 0;

occurs. current_rcb is set to NULL for some reason, and it shouldn't be (or it should be checked). The whys & hows will be how this bug gets fixed.

#3 Updated by Igor Kozhukhov 2 months ago

  • Description updated (diff)

Also available in: Atom