Project

General

Profile

Bug #787

Kernel panic in ip_input.c

Added by Bryan Horstmann-Allen over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
kernel
Start date:
2011-03-09
Due date:
2011-03-11
% Done:

100%

Estimated time:
6.00 h
Difficulty:
Tags:
Gerrit CR:

Description

Using Crossbow to serve a virtual network to VirtualBox 4 VMs (Linux, Windows) and non-global zones.

An NGZ is providing routing, and is running ipf with NAT.

The core is available.

  TIME                 CLASS                                 ENA
  Mar 09 20:35:19.6281 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = 6dccbfda-14c3-469a-86ea-caee926eb1f1
        code = SUNOS-8000-KL
        diag-time = 1299720924 894251
        de = fmd:///module/software-diagnosis
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = defect.sunos.kernel.panic
                certainty = 0x64
                asru = sw:///:path=/var/crash/fakebox/.6dccbfda-14c3-469a-86ea-caee926eb1f1
                resource = sw:///:path=/var/crash/fakebox/.6dccbfda-14c3-469a-86ea-caee926eb1f1
                savecore-succcess = 0
                os-instance-uuid = 6dccbfda-14c3-469a-86ea-caee926eb1f1
                panicstr = BAD TRAP: type=e (#pf Page fault) rp=ffffff0010be70d0 addr=2c occurred in module "ip" due to a NULL pointer dereference
                panicstack = unix:die+dd () |
                             unix:trap+1799 () |
                             unix:cmntrap+e6 () |
                             ip:ill_input_short_v4+682 () |
                             ip:ip_postfrag_loopback+196 () |
                             ip:ip_postfrag_loopcheck+75 () |
                             ip:ire_send_wire_v4+345 () |
                             ip:ire_send_broadcast_v4+2a4 () |
                             ip:ip_output_simple_v4+55c () |
                             ip:ip_output_simple+dd () |
                             ip:icmp_send_reply_v4+1a9 () |
                             ip:icmp_inbound_v4+816 () |
                             ip:ip_fanout_v4+e20 () |
                             ip:ire_recv_local_v4+366 () |
                             ip:ill_input_short_v4+69e () |
                             ip:ip_input+23b () |
                             dls:i_dls_link_rx+2e7 () |
                             mac:mac_rx_deliver+5d () |
                             mac:mac_rx_soft_ring_process+17a () |
                             mac:mac_rx_srs_proto_fanout+4e5 () |
                             mac:mac_rx_srs_drain+26e () |
                             mac:mac_rx_srs_process+180 () |
                             mac:mac_tx_send+334 () |
                             mac:mac_tx_single_ring_mode+91 () |
                             mac:mac_tx+35d () |
                             dld:str_mdata_raw_put+181 () |
                             dld:dld_wput+b5 () |
                             unix:putnext+21e () |
                             vboxflt:vboxNetFltPortOsXmit+2c5 () |
                             vboxflt:vboxNetFltPortXmit+15e () |
                             fffffffff859ca26 () |
                             fffffffff859d93a () |
                             fffffffff859e3cf () |
                             fffffffff856931e () |
                             fffffffff8569943 () |
                             vboxdrv:supdrvIOCtl+13a4 () |
                             vboxdrv:VBoxDrvSolarisIOCtl+360 () |
                             genunix:cdev_ioctl+45 () |
                             specfs:spec_ioctl+5a () |
                             genunix:fop_ioctl+7b () |
                             genunix:ioctl+18e () |
                             unix:brand_sys_syscall+1f5 () | 
                crashtime = 1299720864
                panic-time = March  9, 2011 08:34:24 PM EST EST
        (end fault-list[0])

        fault-status = 0x1
        severity = Major
        __ttl = 0x1
        __tod = 0x4d782adc 0x37a3c258

> ::status
debugging crash dump vmcore.0 (64-bit) from fakebox
operating system: 5.11 oi_148 (i86pc)
image uuid: 6dccbfda-14c3-469a-86ea-caee926eb1f1
panic message: 
BAD TRAP: type=e (#pf Page fault) rp=ffffff0010be70d0 addr=2c occurred in module 
"ip" due to a NULL pointer dereference
dump content: kernel pages only
> *panic_thread::findstack -v
stack pointer for thread ffffff02d867d8a0: ffffff0010be6e30
  ffffff0010be6f20 panic+0x94()
  ffffff0010be6fb0 die+0xdd(e, ffffff0010be70d0, 2c, 1)
  ffffff0010be70c0 trap+0x1799(ffffff0010be70d0, 2c, 1)
  ffffff0010be70d0 0xfffffffffb8001d6()
  ffffff0010be7290 ill_input_short_v4+0x682(ffffff02e5877020, ffffff02e51ae57e, ffffff02e51ae58e, ffffff0010be72a0, ffffff0010be7430)
  ffffff0010be74a0 ip_postfrag_loopback+0x196(ffffff02e5877020, ffffff02d1766378, 92006840, 54, 0)
  ffffff0010be7530 ip_postfrag_loopcheck+0x75(ffffff0354955080, ffffff02d1766378, 92006840, 54, 0, 0, 0, ffffff0010be7a98)
  ffffff0010be7760 ire_send_wire_v4+0x345(ffffff02d52366f8, ffffff0354955080, ffffff02e584b0be, ffffff0010be7930, ffffff02cffdbef0)
  ffffff0010be7810 ire_send_broadcast_v4+0x2a4(ffffff02d523b9b0, ffffff02e5882ce0, ffffff02e584a4fe, ffffff0010be7930, ffffff02cffdbef0)
  ffffff0010be78c0 ip_output_simple_v4+0x55c(ffffff02e5882ce0, ffffff0010be7930)
  ffffff0010be7910 ip_output_simple+0xdd(ffffff02e5882ce0, ffffff0010be7930)
  ffffff0010be7b00 icmp_send_reply_v4+0x1a9(ffffff02e5882ce0, ffffff02e584a4fe, ffffff02e584a512, ffffff0010be7dd0)
  ffffff0010be7b80 icmp_inbound_v4+0x816(ffffff02e5882ce0, ffffff0010be7dd0)
  ffffff0010be7c40 ip_fanout_v4+0xe20(ffffff02e5882ce0, ffffff02e584a4fe, ffffff0010be7dd0)
  ffffff0010be7cc0 ire_recv_local_v4+0x366(ffffff02d523b590, ffffff02e5882ce0, ffffff02e584a4fe, ffffff0010be7dd0)
  ffffff0010be7da0 ill_input_short_v4+0x69e(ffffff02e5882ce0, ffffff02e584a4fe, ffffff02e584a50e, ffffff0010be7dd0, ffffff0010be7f60)
  ffffff0010be7fd0 ip_input+0x23b(ffffff02d5273c68, 0, ffffff02e5882ce0, ffffff0010be8030)
  ffffff0010be80c0 i_dls_link_rx+0x2e7(ffffff02d25212c0, 0, ffffff02e5882ce0, 0)
  ffffff0010be8100 mac_rx_deliver+0x5d(ffffff02d1205010, 0, ffffff02e5882ce0, 0)
  ffffff0010be8190 mac_rx_soft_ring_process+0x17a(ffffff02d1205010, 
  ffffff02d5157800, ffffff02e5882ce0, ffffff02e5882ce0, 1, 0)
  ffffff0010be82d0 mac_rx_srs_proto_fanout+0x4e5(ffffff02d256b980, ffffff02e5882ce0)
  ffffff0010be8350 mac_rx_srs_drain+0x26e(ffffff02d256b980, 800)
  ffffff0010be83e0 mac_rx_srs_process+0x180(ffffff02d231f010, ffffff02d256b980, ffffff02e5882ce0, 0)                
  ffffff0010be84d0 mac_tx_send+0x334(ffffff02d1208008, 0, ffffff02e5882ce0, ffffff0010be84e0)
  ffffff0010be8560 mac_tx_single_ring_mode+0x91(ffffff02d2497980, ffffff02e5882ce0, 0, 0, 0)
  ffffff0010be85f0 mac_tx+0x35d(ffffff02d1208008, ffffff02e5882ce0, 0, 0, 0)
  ffffff0010be8680 str_mdata_raw_put+0x181(ffffff02ddbfbac0, ffffff02e5882ce0)
  ffffff0010be86d0 dld_wput+0xb5(ffffff02ddc063b0, ffffff02e5882ce0)
  ffffff0010be8740 putnext+0x21e(ffffff02de101408, ffffff02e5882ce0)
  ffffff0010be87e0 vboxNetFltPortOsXmit+0x2c5()
  ffffff0010be8860 vboxNetFltPortXmit+0x15e()
  ffffff0010be8940 0xfffffffff859ca26()
  ffffff0010be8a00 0xfffffffff859d93a()
  ffffff0010be8ac0 0xfffffffff859e3cf()
  ffffff0010be8af0 0xfffffffff856931e()
  ffffff0010be8b30 0xfffffffff8569943()
  ffffff0010be8c30 supdrvIOCtl+0x13a4()
  ffffff0010be8d00 VBoxDrvSolarisIOCtl+0x360()
  ffffff0010be8d40 cdev_ioctl+0x45(10900000002, c0185687, fffffd7ffcf4eba0, 202003, ffffff02de571830, ffffff0010be8e24)
  ffffff0010be8d80 spec_ioctl+0x5a(ffffff02d7fe5600, c0185687, fffffd7ffcf4eba0, 202003, ffffff02de571830, ffffff0010be8e24, 0)
  ffffff0010be8e00 fop_ioctl+0x7b(ffffff02d7fe5600, c0185687, fffffd7ffcf4eba0, 202003, ffffff02de571830, ffffff0010be8e24, 0)
  ffffff0010be8f00 ioctl+0x18e(d, ffffffffc0185687, fffffd7ffcf4eba0)
  ffffff0010be8f10 sys_syscall+0x17a()

History

#1

Updated by Rich Lowe over 9 years ago

Digging through the dump, it looks like ire is NULL at source:usr/src/uts/common/inet/ip/ip_input.c:815

ill_input_short_v4+0x660:       movq   -0xb8(%rbp),%r8
ill_input_short_v4+0x667:       movq   (%r8),%rdi
ill_input_short_v4+0x66a:       testq  %rdi,%rdi
ill_input_short_v4+0x66d:       je     +0x5     <ill_input_short_v4+0x674>
ill_input_short_v4+0x66f:       call   -0x43004 <ire_refrele>
ill_input_short_v4+0x674:       movq   -0xb8(%rbp),%r8
ill_input_short_v4+0x67b:       movq   %r12,(%r8)
ill_input_short_v4+0x67e:       movl   %ebx,0x8(%r8)
ill_input_short_v4+0x682:       incl   0x2c(%r12)  <<<
ill_input_short_v4+0x687:       movq   -0xb0(%rbp),%rsi
ill_input_short_v4+0x68e:       movq   %r12,%rdi
ill_input_short_v4+0x691:       movq   %r13,%rdx
ill_input_short_v4+0x694:       movq   %r14,%rcx
ill_input_short_v4+0x697:       xorl   %eax,%eax
ill_input_short_v4+0x699:       call   *0x48(%r12)
> <r12=K
                0           
> ::offsetof ire_t ire_ib_pkt_count
offsetof (ire_t, ire_ib_pkt_count) = 0x2c, sizeof (...->ire_ib_pkt_count) = 4
#2

Updated by Rich Lowe over 9 years ago

The ip header as we currently see it has a destination address of 0,

> ffffff02e51ae57e::print ipha_t ipha_dst
ipha_dst = 0

and the rtc is similarly empty:

> ffffff0010be7430::print rtc_t
{
    rtc_ire = 0
    rtc_ipaddr = 0
    rtc_ip6addr = {
        _S6_un = {
            _S6_u32 = [ 0, 0, 0, 0x54955080 ]
            _S6_u8 = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x80, 0x50, 0x95, 0x54
 ]
            __S6_align = 0
        }
    }
}

So I think the condition at source:usr/src/uts/common/inet/ip/ip_input.c:795 is unfortunately true, we miss the ASSERT because we're -nd, and plough on with a NULL ire. There's all kinds of possibly mutation of nexthop on the way through,
however; nexthop_arg is 0, and I'm reasonably convinced from following the disassembly that nexthop remains so, but
could be wrong.

#3

Updated by Dan McDonald over 9 years ago

  • Due date set to 2011-03-11
  • Status changed from New to In Progress
  • Assignee set to Dan McDonald
  • Priority changed from Normal to Immediate
  • % Done changed from 0 to 80
  • Estimated time set to 6.00 h

I believe I have a fix:

http://www.kebe.com/~danmcd/webrevs/787/

The problem, as Rich pointed out, is that there's no IRE when there should be. THis is because we have address equality in this one case (0.0.0.0 == INADDR_ANY), but we still need an IRE. So we must take the assertion of rc_ire != NULL and instead add it as a check. Machines don't crash anymore with this fix.

#4

Updated by Dan McDonald over 9 years ago

Added an IPv6-equivalent fix, as the IPv6 path has a similar assumption.

#5

Updated by Rich Lowe over 9 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 80 to 100

Resolved in r13300 commit:3ce8381e1103

Also available in: Atom PDF