Project

General

Profile

Bug #5259

ixgbe causes NULL pointer dereference in ddi_dma_sync

Added by Dan Vatca almost 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2014-10-24
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

This crash happened two times (5 months apart) in one of our networks.
The panic info:

> ::panicinfo
             cpu                5
          thread ffffff001edffc40
         message BAD TRAP: type=e (#pf Page fault) rp=ffffff001edff8d0 addr=18 occurred in module "genunix" due to a NULL pointer dereference
             rdi                0
             rsi                0
             rdx                0
             rcx                2
              r8              37f
              r9                0
             rax                0
             rbx                0
             rbp ffffff001edffa10
             r10               3c
             r11               3c
             r12               3c
             r13                0
             r14                0
             r15                2
          fsbase                0
          gsbase ffffff04e95fbb00
              ds               4b
              es               4b
              fs                0
              gs              1c3
          trapno                e
             err                0
             rip fffffffffbae8e9a
              cs               30
          rflags            10286
             rsp ffffff001edff9c0
              ss               38
          gdt_hi                0
          gdt_lo         600001ef
          idt_hi                0
          idt_lo         50000fff
             ldt                0
            task               70
             cr0         8005003b
             cr2               18
             cr3          3c00000
             cr4            426f8

This is the stack trace during the panic:
> $C
ffffff001edffa10 ddi_dma_sync+0x3a(0, 0, 0, 2)
ffffff001edffa60 ixgbe_rx_copy+0x4e(ffffff04ea3c8dc8, 37f, 3c)
ffffff001edffb10 ixgbe_ring_rx+0x24d(ffffff04ead85cc0, ffffffff)
ffffff001edffb50 ixgbe_intr_rx_work+0x34(ffffff04ead85cc0)
ffffff001edffb90 ixgbe_intr_msix+0x57(ffffff04ea92a6b0, 0)
ffffff001edffbe0 apix_dispatch_by_vector+0x8c(21)
ffffff001edffc20 apix_dispatch_lowlevel+0x25(21, 0)
ffffff001edc99e0 switch_sp_and_call+0x13()
ffffff001edc9a40 apix_do_interrupt+0x387(ffffff001edc9a50, 2)
ffffff001edc9a50 _interrupt+0xba()
ffffff001edc9bc0 acpi_cpu_cstate+0x11b(ffffff04e88307e0)
ffffff001edc9bf0 cpu_acpi_idle+0x8d()
ffffff001edc9c00 cpu_idle_adaptive+0x13()
ffffff001edc9c20 idle+0xa7()
ffffff001edc9c30 thread_start+8()

Looking at the rx_data we can see that the rx_buf points to NULL, and triggers the BAD TRAP in ddi_dma_sync:
> ffffff04ea3c8dc8::print -t ixgbe_rx_data_t work_list[0x37f]
rx_control_block_t *work_list[0x37f] = 0xffffff04ee4044c0
> 0xffffff04ee4044c0::print -t rx_control_block_t
rx_control_block_t {
    mblk_t *mp = 0
    uint32_t ref_cnt = 0
    dma_buffer_t rx_buf = {
        caddr_t address = 0
        uint64_t dma_address = 0
        ddi_acc_handle_t acc_handle = 0
        ddi_dma_handle_t dma_handle = 0
        size_t size = 0
        size_t len = 0
    }
    frtn_t free_rtn = {
        void (*)() free_func = ixgbe_rx_recycle
        caddr_t free_arg = 0xffffff04ee4044c0
    }
    struct igb_rx_data *rx_data = 0xffffff04ea3c8dc8
}

This has been rare for us. Did anybody else see this?

Also available in: Atom PDF