Project

General

Profile

Actions

Bug #1872

closed

NULL pointer dereference in iscsit

Added by The Written Word Inc over 11 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
comstar - iSCSI/FC/SAS target
Start date:
2011-12-09
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

Running oi_151a on a Sun X4100M2 with COMSTAR and ZFS active. Kernel faulted with:
BAD TRAP: type=e (#pf Page fault) rp=ffffff00129c8a10 addr=8d8 occurred in module "unix" due to a NULL pointer dereference

Can't really pinpoint the exact cause of the crash or what was running while the crash was occurring. This system is heavily used for NFS, COMSTAR, and ZFS.

Crash file available at http://support.thewrittenword.com/vmdump.0.

Kernel messages and stack information available at http://support.thewrittenword.com/crash.0. This consists of the output from running:
echo '::panicinfo\n::cpuinfo -v\n::threadlist -v 10\n::msgbuf\n*panic_thread::findstack -v\n::stacks' | mdb 0 > /tmp/crash.0

# psrinfo -v
Status of virtual processor 0 as of: 12/09/2011 19:10:00
  on-line since 12/09/2011 03:17:00.
  The i386 processor operates at 1800 MHz,
        and has an i387 compatible floating point processor.
Status of virtual processor 1 as of: 12/09/2011 19:10:00
  on-line since 12/09/2011 03:17:02.
  The i386 processor operates at 1800 MHz,
        and has an i387 compatible floating point processor.

# prtconf | grep Memory
Memory size: 8192 Megabytes

# mdb 0
Loading modules: [ unix genunix specfs dtrace mac cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci zfs mpt sd ip hook neti sockfs arp usba qlc fctl stmf stmf_sbd md lofs fcp random idm crypto cpc fcip ufs logindmux ptm sppp smbsrv nfs nsmb ]
> *panic_thread::findstack -v
stack pointer for thread ffffff00129c8c40: ffffff00129c6e70
  ffffff00129c8770 0xffffff03076ec250()
  ffffff00129c8860 panic+0x94()
  ffffff00129c88f0 die+0xdd(e, ffffff00129c8a10, 8d8, 0)
  ffffff00129c8a00 trap+0x1799(ffffff00129c8a10, 8d8, 0)
  ffffff00129c8a10 0xfffffffffb8001d6()
  ffffff00129c8b40 mutex_enter+0xb()
  ffffff00129c8b70 iscsit_deferred+0x36(ffffff0313beb090)
  ffffff00129c8c20 taskq_thread+0x285(ffffff030756bd50)
  ffffff00129c8c30 thread_start+8()
> ::panicinfo
             cpu                0
          thread ffffff00129c8c40
         message 
BAD TRAP: type=e (#pf Page fault) rp=ffffff00129c8a10 addr=8d8 occurred in modul
e "unix" due to a NULL pointer dereference
             rdi              8d8
             rsi ffffff0313beb1f8
             rdx ffffff00129c8c40
             rcx         9b80fd00
              r8 ffffff036f674000
              r9 ffffff032e8c51e0
             rax                0
             rbx ffffff0306d29000
             rbp ffffff00129c8b40
             r10         56ceaa8e
             r10         56ceaa8e
             r11                8
             r12 ffffff0313beb230
             r13 ffffff0313beb090
             r14              8d8
             r15                0
          fsbase                0
          gsbase fffffffffbc304a0
              ds               4b     
              es               4b
              fs                0
              gs              1c3
          trapno                e
             err                2
             rip fffffffffb86ccfb
              cs               30
          rflags            10246
             rsp ffffff00129c8b08
              ss                0
          gdt_hi                0
          gdt_lo         e00001ef
          idt_hi                0
          idt_lo         d0000fff
             ldt                0
            task               70
             cr0         8005003b
             cr2              8d8
             cr3          8c00000
             cr4              6f8

Actions #1

Updated by Rich Lowe over 11 years ago

Actions #2

Updated by Rich Lowe over 11 years ago

All of this is conjecture, I understand none of iscsi, the target, nor comstar.

I think we're processing deferred commands for a session, the threadlist makes it look as if we've also lost contact with a lot of clients (or sessions? or something?) and are tearing down their state. The locking for this looks mostly right (it's happening under the lock we take).

That, however, makes me wonder if we've entirely ripped a session away from under the deferred processing taskq, and that that's why our session is NULL. (kmem logs sure would help, but oh well).

It seems possible that this can happen, because

idm_conn_unref is called via taskq from CS_S11_COMPLETE of idm_update_state, immediately after notifying of the CONNECT_LOST.

We do try to abort any tasks related to this client: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/io/idm/idm_conn_sm.c#1291

Perhaps we don't succeed? Perhaps we lock insufficiently?

Actions #3

Updated by Dan McDonald over 10 years ago

We're seeing this now from a Nexenta customer. Same situation (lots of closing iSCSI threads). From my coredump (can't share due to customer data), the iSCSI target connection login state looks like one that was freshly allocated:

ict_login_sm = {
icl_login_state = 1 (ILS_LOGIN_INIT)
icl_login_last_state = 1 (ILS_LOGIN_INIT)

but hasn't had a session:

ict_sess = 0

attached yet. No idea if this has been able to be easily reproduce (and nobody has produced one with kmem_flags = 0xf either, I believe).

Actions #4

Updated by Dan McDonald over 9 years ago

  • Category set to comstar - iSCSI/FC/SAS target
  • Assignee set to Dan McDonald
  • % Done changed from 0 to 60

We have reproduced a variant of this in-house at Nexenta. (BTW, kmem_flags = 0xf didn't help.)

Basically, an iSCSI PDU that arrives before login can access state that isn't initialized until login occurs.

Actions #5

Updated by Dan McDonald over 9 years ago

  • Subject changed from NULL pointer dereference in kernel to NULL pointer dereference in iscsit
Actions #6

Updated by Robert Mustacchi over 9 years ago

  • Status changed from New to Resolved
  • % Done changed from 60 to 100
  • Tags deleted (needs-triage)

Resolved in 241996a3debb0b7690aef018488c76770d83311e.

Actions

Also available in: Atom PDF