Bug #1737

kernel panic in tcp_input_data

Added by Jeffrey Baitis over 2 years ago. Updated over 2 years ago.

Status:Closed Start date:2011-11-06
Priority:Normal Due date:
Assignee:Dan McDonald % Done:

100%

Category:kernel Spent time: -
Target version:-
Difficulty:Hard Tags:duplicate, 1631

Description

SunOS openindyraid 5.11 oi_151a i86pc i386 i86pc Solaris
AMD Athlon(tm) II X2 240e Processor CPU 1 (Dual-core)
Memory size: 4095 Megabytes

This occurred while my Transmission bittorrent client was busily clicking away. Looks suspiciously like bug 1631 https://www.illumos.org/issues/1631

Output of savecore is 257MB and can be downloaded at http://baitisj.baitis.net/~baitisj/crash_November/vmdump.1

 Nov  6 00:19:05 openindyraid panic[cpu1]/thread=ffffff0007e06c40:
 Nov  6 00:19:05 openindyraid genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ffffff0007e05b40 addr=20 occurred in module "ip" due to a NULL pointer dereference

 sched:
 #pf Page fault
 Bad kernel fault at addr=0x20
 pid=0, pc=0xfffffffff7c2e2f5, sp=0xffffff0007e05c30, eflags=0x10246
 cr0: 80050033<pg,wp,ne,et,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
 cr2: 20
 cr3: 108fbf000
 cr8: c

 rdi:                0 rsi: ffffff01ce05de80 rdx:                1
 rcx:                0  r8: ffffff0007e05c90  r9: ffffff0007e05c8c
 rax:                0 rbx: ffffff01db9042da rbp: ffffff0007e05da0
 r10:                0 r11:               40 r12: ffffff01e88acb40
 r13: ffffff01ce05de80 r14:                1 r15: ffffff025905eb5c
 fsb:                0 gsb: ffffff01d0deaac0  ds:               4b
 es:               4b  fs:                0  gs:              1c3
 trp:                e err:                0 rip: fffffffff7c2e2f5
 cs:               30 rfl:            10246 rsp: ffffff0007e05c30
 ss:               38

 ffffff0007e05a20 unix:die+dd ()
 ffffff0007e05b30 unix:trap+1799 ()
 ffffff0007e05b40 unix:cmntrap+e6 ()
 ffffff0007e05da0 ip:tcp_input_data+38ad ()
 ffffff0007e05e30 ip:squeue_enter+440 ()
 ffffff0007e05f00 ip:ip_fanout_v4+48d ()
 ffffff0007e05f80 ip:ire_recv_local_v4+366 ()
 ffffff0007e06060 ip:ill_input_short_v4+6ce ()
 ffffff0007e06290 ip:ip_input+23b ()
 ffffff0007e06300 ip:ip_rput+85 ()
 ffffff0007e06370 unix:putnext+21e ()
 ffffff0007e063a0 sppp:sppp_lrput+53 ()
 ffffff0007e06410 unix:putnext+21e ()
 ffffff0007e06460 spppcomp:spppcomp_rput+f4 ()
 ffffff0007e064d0 unix:putnext+21e ()
 ffffff0007e06500 sppptun:sppptun_urput+fd ()
 ffffff0007e06570 unix:putnext+21e ()
 ffffff0007e065e0 dld:dld_str_rx_unitdata+dd ()
 ffffff0007e066d0 dls:i_dls_link_rx+2e7 ()
 ffffff0007e06710 mac:mac_rx_deliver+5d ()
 ffffff0007e067a0 mac:mac_rx_soft_ring_process+17a ()
 ffffff0007e068e0 mac:mac_rx_srs_proto_fanout+4e5 ()
 ffffff0007e06960 mac:mac_rx_srs_drain+26e ()
 ffffff0007e069f0 mac:mac_rx_srs_process+180 ()
 ffffff0007e06a40 mac:mac_rx_classify+159 ()
 ffffff0007e06aa0 mac:mac_rx_flow+54 ()
 ffffff0007e06af0 mac:mac_rx_common+1f6 ()
 ffffff0007e06b40 mac:mac_rx+ac ()
 ffffff0007e06b90 iprb:iprb_intr+123 ()
 ffffff0007e06be0 unix:av_dispatch_autovect+7c ()
 ffffff0007e06c20 unix:dispatch_hardint+33 ()
 ffffff00091829d0 unix:switch_sp_and_call+13 ()
 ffffff0009182a20 unix:do_interrupt+b8 ()
 ffffff0009182a30 unix:cmnint+ba ()
 ffffff0009182b50 unix:do_splx+8d ()
 ffffff0009182b70 genunix:disp_lock_exit+55 ()
 ffffff0009182be0 genunix:turnstile_wakeup+149 ()
 ffffff0009182c00 unix:mutex_vector_exit+6a ()
 ffffff0009182c70 genunix:pollwakeup+11f ()
 ffffff0009182c90 genunix:strpollwakeup+1d ()
 ffffff0009182cc0 fifofs:fifo_wakereader+4e ()
 ffffff0009182d70 fifofs:fifo_write+31e ()
 ffffff0009182de0 genunix:fop_write+6b ()
 ffffff0009182e90 genunix:write+2e2 ()
 ffffff0009182ec0 genunix:write32+22 ()
 ffffff0009182f10 unix:brand_sys_syscall32+17a ()
 Nov  6 00:19:05 openindyraid unix: [ID 100000 kern.notice]
 Nov  6 00:19:05 openindyraid genunix: [ID 672855 kern.notice] syncing file systems...
 Nov  6 00:19:05 openindyraid genunix: [ID 904073 kern.notice]  done

History

Updated by Rich Lowe over 2 years ago

Hey Dan. I asked that this be filed as a (probable?) dup, to get the extra data about cause on file, since conversation about the prior bug seemed to suggest that you thought the situation should be rare (and perhaps active, rather than happenstance).

Seemed better to have a probable dup, than miss a new path to the problem.

Updated by Dan McDonald over 2 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

This is definitely a duplicate of #1631. It's a tcp_t/conn_t with the upper-layer sonode detached from it.

From the attached coredump:

$c
tcp_input_data+0x38ad(ffffff01e88ac840, ffffff01ce05de80, ffffff01d0b99e80,
ffffff0007e06090)

<SNIP!>

ffffff01e88ac840::print conn_t conn_upcalls
conn_upcalls = 0 ffffff01e88ac840+0x300::print tcp_t tcp_state
tcp_state = 0

conn_upcalls has been NULL-ed out, and the tcp_state is 0 == TCPS_ESTABLISHED.

Updated by Dan McDonald over 2 years ago

  • Tags changed from needs-triage to duplicate, 1631

Also available in: Atom PDF