Project

General

Profile

Actions

Bug #6740

closed

ISCSI_OP_SCSI_RSP needs support

Added by Dan McDonald about 5 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
comstar - iSCSI/FC/SAS target
Start date:
2016-03-10
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

An OmniOS customer sent me a kernel panic in IDM. What follows is my stream-of-consciousness debugging session.

Okay, got the coredump, and let's hit it with MDB!

> ::status
debugging crash dump vmcore.0 (64-bit) from strmec1
operating system: 5.11 omnios-33c53a8 (i86pc)
image uuid: 5952fee3-3b87-4437-f262-cde20bad8d07
panic message:=20
BAD TRAP: type=3De (#pf Page fault) rp=3Dfffff001ea8dca60 addr=3D0 =
occurred in module=20
"<unknown>" due to a NULL pointer dereference
dump content: kernel pages only
>

Oooh, module "unknown" is not a good place to be.  I wonder why it's =
there?

> $c
idm_pdu_rx+0xdb(fffff048faff0000, fffff048fa252110)
idm_sorx_thread+0xfc(fffff048faff0000)
thread_start+8()
>  idm_pdu_rx+0xdb::dis
idm_pdu_rx+0xb3:                movl   $0x25,%esi
idm_pdu_rx+0xb8:                movq   %rbx,%rdi
idm_pdu_rx+0xbb:                call   +0x2f60  <idm_conn_rx_pdu_event>
idm_pdu_rx+0xc0:                jmp    -0x2a    <idm_pdu_rx+0x98>
idm_pdu_rx+0xc2:                nopw   0x0(%rax,%rax)
idm_pdu_rx+0xc8:                movq   %r12,%rdi
idm_pdu_rx+0xcb:                call   +0x361f9e0       <mutex_exit>
idm_pdu_rx+0xd0:                movq   %r13,%rsi
idm_pdu_rx+0xd3:                movq   %rbx,%rdi
idm_pdu_rx+0xd6:                call   +0x155   <idm_pdu_rx_forward_ffp>
idm_pdu_rx+0xdb:                decl   %eax
idm_pdu_rx+0xdd:                je     -0x3f    <idm_pdu_rx+0xa0>
idm_pdu_rx+0xdf:                movq   %r12,%rdi
idm_pdu_rx+0xe2:                call   +0x361f8a9       <mutex_enter>
idm_pdu_rx+0xe7:                jmp    -0x9b    <idm_pdu_rx+0x51>
idm_pdu_rx+0xec:                nopl   0x0(%rax)
idm_pdu_rx+0xf0:                movq   %rbx,%rdi
idm_pdu_rx+0xf3:                nop   =20
idm_pdu_rx+0xf4:                lock nop=20
idm_pdu_rx+0xf6:                nop   =20
idm_pdu_rx+0xf7:                nop   =20
>=20

Huh... we call idm_pdu_rx_forward_ffp().  That isn't in the $c above?  =
Let's see if ::findstack helps us:

> ::msgbuf !grep thread
panic[cpu9]/thread=3Dfffff001ea8dcc40:=20
fffff001ea8dcc20 idm:idm_sorx_thread+fc ()
fffff001ea8dcc30 unix:thread_start+8 ()
> fffff001ea8dcc40::findstack -v
stack pointer for thread fffff001ea8dcc40: fffff001ea8dc430
  fffff001ea8dc840 0xfffff001ea8dc510()
  fffff001ea8dc8b0 param_preset()
  fffff001ea8dc940 die+0xdf(e, fffff001ea8dca60, 0, 9)
  fffff001ea8dca50 trap+0x1458(fffff001ea8dca60, 0, 9)
  fffff001ea8dca60 0xfffffffffb8001d6()
  fffff001ea8dcbc0 idm_pdu_rx+0xdb(fffff048faff0000, fffff048fa252110)
  fffff001ea8dcc20 idm_sorx_thread+0xfc(fffff048faff0000)
  fffff001ea8dcc30 thread_start+8()
>=20

HUH?!?  0xfffffffffb8001d6 is in hyperspace:

> 0xfffffffffb8001d6::dis  =20
0xfffffffffb8001d6:             jmp    +0x363   <_sys_rtt>
0xfffffffffb8001db:             cmpw   $0x30,0xd0(%rbp)

Eeeesh.

Let's look at the code...

51 idm_pdu_rx(idm_conn_t *ic, idm_pdu_t *pdu)
52 {
53    iscsi_async_evt_hdr_t *async_evt;
54
55    /*
56     * If we are in full-featured mode then route SCSI-related
57     * commands to the appropriate function vector
58     */
59    ic->ic_timestamp =3D ddi_get_lbolt();
60    mutex_enter(&ic->ic_state_mutex);
61    if (ic->ic_ffp && ic->ic_pdu_events =3D=3D 0) {
62        mutex_exit(&ic->ic_state_mutex);
63
64        if (idm_pdu_rx_forward_ffp(ic, pdu) =3D=3D B_TRUE) {
65            /* Forwarded SCSI-related commands */
66            return;
67        }
68        mutex_enter(&ic->ic_state_mutex);
69    }
....

Okay, this kinda makes sense.  We clearly were calling =
idm_pdu_rx_forward_ffp().

195 idm_pdu_rx_forward_ffp(idm_conn_t *ic, idm_pdu_t *pdu)
196 {
197    /*
198     * If this is an FFP request, call the appropriate handler199     =
* and return B_TRUE, otherwise return B_FALSE.
200     */
201    switch (IDM_PDU_OPCODE(pdu)) {
202    case ISCSI_OP_SCSI_CMD:
203        (*ic->ic_conn_ops.icb_rx_scsi_cmd)(ic, pdu);
204        return (B_TRUE);

We'll let's take a look at that (pdu->isp_hdr->opcode & =
ISCSI_OPTCODE_MASK).  "pdu" is the 2nd arg to idm_pdu_rx() BTW.

> fffff048fa252110::print -at idm_pdu_t isp_hdr
fffff048fa252140 iscsi_hdr_t *isp_hdr =3D 0xfffff048fa2522b0
> 0xfffff048fa2522b0::print -at iscsi_hdr_t opcode
fffff048fa2522b0 uint8_t opcode =3D 0xe1
> 0xe1 & 0x3f=3DX
                21            

0x21 is:
 #define    ISCSI_OP_SCSI_RSP        0x21

Okay, so let's look at idm_pdu_rx_forward_ffp()'s handler:

230    case ISCSI_OP_SCSI_RSP:
231        (*ic->ic_conn_ops.icb_rx_scsi_rsp)(ic, pdu);
232        return (B_TRUE);

Time to look at the idm_conn_t's (the 1st argument to idm_pdu_rx()) =
icb_rx_scsi_rsp:

    fffff048faff1350 idm_rx_pdu_cb_t *ic_conn_ops.icb_rx_scsi_rsp =3D 0

WHOA!  That's a NULL function pointer! =20

Did I mention that IDM and COMSTAR code is... well... brittle?

So I'm guessing your iSCSI initiator sent an ISCSI_OP_SCSI_RSP, and it =
FAILED ON OUR END!!!

Oh but GOOD NEWS --> illumos-nexenta has a fix for this:

    =
https://github.com/Nexenta/illumos-nexenta/commit/f0140e87540f4f91774b5a8f=
2a2eb5be835b78d2

So it's time to upstream this.

Actions #1

Updated by Dan McDonald about 5 years ago

illumos-nexenta already fixes this. Here's URL for the fix: https://github.com/Nexenta/illumos-nexenta/commit/f0140e87540f4f91774b5a8f2a2eb5be835b78d2

Actions #2

Updated by Electric Monk about 5 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit a481fa48e4b49ab092647a92bb0ea0edf9afc5ce

commit  a481fa48e4b49ab092647a92bb0ea0edf9afc5ce
Author: Rick McNeal <rick.mcneal@nexenta.com>
Date:   2016-04-03T15:25:44.000Z

    6740 ISCSI_OP_SCSI_RSP needs support
    Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
    Reviewed by: Dan McDonald <danmcd@omniti.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Actions

Also available in: Atom PDF