Project

General

Profile

Bug #1873

comstar iscsi target; kernel panic idm; mutex_enter: bad mutex

Added by Martin Ilavsky almost 9 years ago. Updated over 8 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
-
Start date:
2011-12-09
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

OS: oi_151a

iSCSI server with two HP-UX iSCSI clients.

System panics with a message: bad mutex in idm_conn_event. More information attached in crash.0 file.

message mutex_enter: bad mutex, lp=d7d75cbc owner=d6c72da0 thread=d12afda0

> $c
vpanic(fe8f5074, fe8f505c, d7d75cbc, d6c72da0, d12afda0, 0)
mutex_panic+0x5e(fe8f505c, d7d75cbc, 0, fec23f08)
mutex_vector_enter+0x43d(d7d75cbc, fec23f08, d12afca8, fe859537)
idm_conn_event+0x19(d7d75000, 12, 0, fe86b2e1)
idm_login_timeout+0x12(d7d75000)
callout_list_expire+0x68(d0951e80, d5757e00, cdf992d3, 13)
callout_expire+0x1b(d0951e80)
callout_execute+0x19(d0951e80, d2479bf0)
taskq_thread+0x220(d24840e8, 0)
thread_start+8()
>

While browsing for the solution I found what seems to be the same issue: http://wesunsolve.net/bugid/id/6998152


Files

crash.0 (346 KB) crash.0 Martin Ilavsky, 2011-12-09 07:51 PM
crash.2 (381 KB) crash.2 Martin Ilavsky, 2011-12-09 09:13 PM
#1

Updated by The Written Word Inc almost 9 years ago

What version of HP-UX? What version of the HP-UX initiator?

#2

Updated by Martin Ilavsky almost 9 years ago

The Written Word Inc wrote:

What version of HP-UX? What version of the HP-UX initiator?

HP-UX 11.31:

#swlist | grep -E 'OE|Fea|iS'
  FEATURE11i                            B.11.31.1109.367a Feature Enablement Patches for HP-UX 11i v3, September 2011
  HPUX11i-DC-OE                         B.11.31.1109   HP-UX Data Center Operating Environment
  iSCSI-00                              B.11.31.03b    HP-UX iSCSI Software Initiator
#

So far I was able to reproduce it by issuing ioscan from both nodes (clients) at the same time.

#3

Updated by The Written Word Inc almost 9 years ago

We're running B.11.31.03c on one client and don't see this problem. But, you have two clients. We do see a different issue with the HP-UX initiator but it's an HP bug we're working with them on.

#4

Updated by Martin Ilavsky almost 9 years ago

For the sake of test I've upgraded iSCSI initiator on both nodes to B.11.31.03c. I got panic again though with a different message this time (but still idm):

message BAD TRAP: type=e (#pf Page fault) rp=d0c9cb94 addr=34 occurred in module "genunix" due to a NULL pointer dereference

> $c
taskq_dispatch+0x11(0, f8274d8c, dcbe9d48, 0)
idm_conn_event_locked+0x7f(d1480000)
idm_conn_event+0x2c(d1480000, 12, 0, fe86b2e1)
idm_login_timeout+0x12(d1480000)
callout_list_expire+0x68(d0951d80, d0b0be00, e84b86ce, 45)
callout_expire+0x1b(d0951d80)
callout_execute+0x19(d0951d80, d09e87c8)
taskq_thread+0x220(d09ec688, 0)
thread_start+8()
>

Again, for the reference I'm attaching more information in file.

After this crash, I tried to panic it again with ioscan's in loop from both clients - nothing. It seems though ioscan helps (iscsi login), it doesn't show up right after two parallel ioscans are executed.

#5

Updated by Martin Ilavsky almost 9 years ago

I had one of those facepalm moments yesterday ..

Both HPUX nodes were reignited (restored) from the golden image. Generally HPUX strips down settings of the system golden image is made from. iSCSI initiator name is not one of them though; both HPUX nodes were using the same initiator name.

After initiator upgrade (B.11.31.03c) panic string looked similar to one described here: https://www.illumos.org/issues/1872 where only one single HPUX was using OI target.

I've changed the initiator name on both nodes and re-mapped OI LUNs back. So far, so good. I have an uptime for 18hrs - more than I was able to get before.

I'm doing more tests and waiting whether system crashes again.

#6

Updated by Rich Lowe almost 9 years ago

  • Priority changed from Normal to High

Huh, that's bad. Hey Dan, obvious implications.

#7

Updated by Dan McDonald almost 9 years ago

Is is possible to upload the actual system dumps (vmdump.X files)? I'll be back with more in a bit.

#8

Updated by Dan McDonald almost 9 years ago

This is weird. The offsets you have (e.g. idm_conn_even_locked+0x7f) don't correspond to instructions in my oi_151a idm at all. Did you build your own kernel? The OpenIndiana 151a I have doesn't have instructions in the same place.

I will continue to dig, because IDM is scary enough where I'm not surprised a panic can happen, but I will either need an actual vmdump file, or a solid reproduction method I can attempt here.

#9

Updated by Dan McDonald almost 9 years ago

Oh my... I just realized something. Are you running 32-bit by any chance?

Yes... yes you are. My bad. I'll continue.

Okay! Check this out:

taskq_dispatch+0x11(0, f8274d8c, dcbe9d48, 0)

note the NULL taskq pointer. It's being obtained from this in IDM-land:

272 
273 (void) taskq_dispatch(ic->ic_state_taskq, &idm_conn_event_handler,
274 event_ctx, TQ_SLEEP);

So somehow, ic_state_taskq got set to NULL. And in the first crash, ic_state_mutex was corrupt.

This is a use-after-free bug. If you can reproduce this easily, please temporarily add this to /etc/system:

set kmem_flags=0xf

reboot the system, and then reproduce the bug. If I'm right and it's use-after-free, the enhance kmem debugging will record who freed it prematurely. After that panic, PLEASE get the vmdump.N file uploaded. This is a 32-bit kernel, so the footprint shouldn't be that big.

Pardon the frequent updates.

#10

Updated by Martin Ilavsky almost 9 years ago

Dan McDonald wrote:

Oh my... I just realized something. Are you running 32-bit by any chance?

Yeah, it's 32-bit .. it's an older HW used as a storage in LAB.

I put those dumps here:

http://ilavsky.name/oi/vmdump.0 ; w/ B.11.31.03b initiator
http://ilavsky.name/oi/vmdump.2 ; w/ B.11.31.03c initiator

#11

Updated by Martin Ilavsky almost 9 years ago

Dan McDonald wrote:

This is weird. The offsets you have (e.g. idm_conn_even_locked+0x7f) don't correspond to instructions in my oi_151a idm at all. Did you build your own kernel? The OpenIndiana 151a I have doesn't have instructions in the same place.

Nothing custom, all defaults.

Currently I don't have any x86-64 hardware in the LAB. But if that is a problem, I can find a way to do those tests on 64b.

#12

Updated by Dan McDonald almost 9 years ago

32-bit is okay now that I think I have a handle on WTF is going on. Can you please do the kmem_flags trick for one more reproduction (and coredump)? I suspect a kmem-debugging dump will help me a lot.

#13

Updated by Martin Ilavsky almost 9 years ago

Dan McDonald wrote:

32-bit is okay now that I think I have a handle on WTF is going on. Can you please do the kmem_flags trick for one more reproduction (and coredump)? I suspect a kmem-debugging dump will help me a lot.

Yeah, I didn't read those thread updates in the right order.

I'll do it, but I must wait till tomorrow - those mapped LUNs are used in training session right now.

#14

Updated by Martin Ilavsky almost 9 years ago

I've set the kmem_flags (reboot) and set the initiator names on both clients to the same name. System crashed twice (I guess right after system came available initiators tried to login again). In both cases, though, panic was due to bad_mutex.

Both dumps are available here:

http://ilavsky.name/oi/vmdump.4
http://ilavsky.name/oi/vmdump.5

# mdb 4
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs ip 
hook neti sockfs arp usba uhci stmf stmf_sbd fctl md lofs random idm sppp crypto cpc fcip ufs ptm nfs ]
> $C
d290abb8 vpanic(fe8f5074, fe8f505c, e5b32cbc, deadbee8, d290ada0, 0)
d290abe8 mutex_panic+0x5e(fe8f505c, e5b32cbc, 0, fec23f08)
d290ac48 mutex_vector_enter+0x43d(e5b32cbc, fec23f08, d290aca8, fe859537)
d290ac68 idm_conn_event+0x19(e5b32000, 12, 0, fe86b2e1)
d290ac88 idm_login_timeout+0x12(e5b32000)
d290acc8 callout_list_expire+0x68(d18d26c0, d279bdc0, 7ae7d8b0, cf79)
d290ace8 callout_expire+0x1b(d18d26c0)
d290ad08 callout_execute+0x19(d18d26c0, d270bcf0)
d290ad88 taskq_thread+0x220(d25fd628, 0)
d290ad98 thread_start+8()
> ::quit

# mdb 5
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs ip hook
 neti sockfs arp usba uhci stmf stmf_sbd fctl md lofs random idm cpc fcip ufs sppp nfs ]
> $C
d45e3bb8 vpanic(fe8f5074, fe8f505c, e115ecbc, baddcaf8, d45e3da0, 0)
d45e3be8 mutex_panic+0x5e(fe8f505c, e115ecbc, 0, d394d080)
d45e3c48 mutex_vector_enter+0x43d(e115ecbc, d394d080, d45e3ca8, fe859537)
d45e3c68 idm_conn_event+0x19(e115e000, 12, 0, fe86b2e1)
d45e3c88 idm_login_timeout+0x12(e115e000)
d45e3cc8 callout_list_expire+0x68(d18d27c0, d8728140, 767aed7f, cf90)
d45e3ce8 callout_expire+0x1b(d18d27c0)
d45e3d08 callout_execute+0x19(d18d27c0, d4193ab8)
d45e3d88 taskq_thread+0x220(d431ee20, 0)
d45e3d98 thread_start+8()
>
#15

Updated by Dan McDonald almost 9 years ago

Someone's releasing twice or forgetting to hold a second time, for sure. Question is, how to prevent it?

#16

Updated by Rich Lowe almost 9 years ago

My assumption was that two clients with the same name was the trigger. Do we have anyway to notice that there's two clients behind that name, and tell one to take a hike? (refuse to setup a session for the 2nd?)

#17

Updated by Dan McDonald almost 9 years ago

Rich --> that appears to be the trigger. At the very least, I'd like to reduce this from a crasher to a memory leak.

One possibility that occurs to me is that a timeout attached to the idm_conn_t (the iSCSI Data Mover's connection abstraction) should be cancelled prior to freeing the idm_conn_t. So I compiled an idm with this tiny diff in it:

diff -r 1664cc953c92 usr/src/uts/common/io/idm/idm_conn_sm.c
--- a/usr/src/uts/common/io/idm/idm_conn_sm.c    Wed Dec 14 15:05:02 2011 -0500
+++ b/usr/src/uts/common/io/idm/idm_conn_sm.c    Thu Dec 15 22:46:24 2011 -0500
@@ -162,6 +162,9 @@

     taskq_destroy(ic->ic_state_taskq);

+    /* We may also have an outstanding timeout.  Kill it. */
+    (void) untimeout(ic->ic_state_timeout);
+
     cv_destroy(&ic->ic_state_cv);
     /*
      * The thread that generated the event that got us here may still

and if the filer would like to give it a spin three ways: your crashing condition, a while with the kmem flags still attached, and then again in production, I'd appreciate it.

BTW, I found additional memory leaks in iSCSI, but I'll deal with those later (and may need your help testing again... ;).

#18

Updated by Rich Lowe almost 9 years ago

I'm assuming the "You" in the above is not me.

I don't think that reduces a crasher to a leak, I think it changes it from a DoS on the server to a DoS on the client that legitimately holds the name in use. I'm still able to set up a session as 'foo', and clobber the legitimate 'foo', no? That's still preferably, sure, but it's not good.

#19

Updated by Dan McDonald almost 9 years ago

I may have a lead (and Rich's denial-of-service possibility may be a deeper architectural flaw).

Look through here

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/io/comstar/port/iscsit/iscsit_login.c#1263

at login_session_sm_bind().

One of the things it does is look up an existing session:

   1375     /*
   1376      * Check if there is an existing session matching this ISID.  If
   1377      * tgt == NULL then we'll look for the session on the global list
   1378      * of discovery session.  If we find a session then the ISID
   1379      * exists.
   1380      */
   1381     existing_sess = iscsit_tgt_lookup_sess(tgt, lsm->icl_initiator_name,
   1382         lsm->icl_isid, lsm->icl_tsih, lsm->icl_tpgt_tag);

Now I'm not sure if his duplicated system also shares other lsm (login session) fields, but I'm betting that existing_sess belongs to the first client with the same-name to login. It already has a timeout, and who-knows-what-else. If there is an existing session, then the existing idm_conn_t (via iscsit_conn_t) is found and the two are linked as if the second one is "reinstating" its login on the first.

I'm still not sure which of the two idm_conn_ts is lacking a reference hold, the original, or the new one. I do see idm_conn_reinstate_event() passing things that the idm event structure seems to never use (idm_conn_t instead of a idm_pdu_t per idm_conn_event_handler()).

Oooh, idm_conn_event_handler() calls idm_conn_rele() at the end... its possible this is the one-too-many reference releases we're looking for?

Some boot-time DTrace would be helpful here. (Especially since the idm_conn_t reference model is baroque.)

#20

Updated by Martin Ilavsky almost 9 years ago

I want to help, but I would need some guidance, if possible. I'm new to OI/illumos, I'm just getting familiar with it and don't feel 'home' yet.

Following the wiki I was able to fetch the sources and do the prep work. I saved your diff pasted above, but the hunk was rejected. /oi/illumos-gate/ is the $home, I saved the patch file there:

adam:(/oi/illumos-gate)# cat patch-idm.diff
--- a/usr/src/uts/common/io/idm/idm_conn_sm.c    Wed Dec 14 15:05:02 2011 -0500
+++ b/usr/src/uts/common/io/idm/idm_conn_sm.c    Thu Dec 15 22:46:24 2011 -0500
@@ -162,6 +162,9 @@

     taskq_destroy(ic->ic_state_taskq);

+    /* We may also have an outstanding timeout.  Kill it. */
+    (void) untimeout(ic->ic_state_timeout);
+
     cv_destroy(&ic->ic_state_cv);
     /*
      * The thread that generated the event that got us here may still
adam:(/oi/illumos-gate)#

adam:(/oi/illumos-gate)# patch -p1 < patch-idm.diff
patching file usr/src/uts/common/io/idm/idm_conn_sm.c
Hunk #1 FAILED at 162.
1 out of 1 hunk FAILED -- saving rejects to file usr/src/uts/common/io/idm/idm_conn_sm.c.rej
adam:(/oi/illumos-gate)#

Lame as it may, I have edited the file manually and did proceed. I used dmake in /$SRC/uts/common/io/idm, all went OK.

Now I don't know how to load this new module and use it upon reboot. Thanks for help (and patience).

#21

Updated by Dan McDonald almost 9 years ago

To use it on your running system you'll need to compile it with non-debug. A "make all" in $SRC/uts/intel/idm will create obj*/idm files. Since you're running 32-bit, you need only worry about obj32/idm. The 32-bit one will go in /kernel/misc/idm.

BEFORE you replace any kernel modules, create a backup boot environment (which will create a new grub entry). Something like:

     beadm create recovery

Then replace /kernel/misc/idm with the obj32/idm version, then reboot. The new idm module will load up.

If things go really badly, you can use grub and select the recovery boot environment. You can then

   beadm mount <name-of-original-BE> /mnt
   cp /kernel/misc/idm /mnt/kernel/misc/idm

which will restore your original BE's idm module.

Hope this helps.

(Edited for proper path of installed idm module.)

#22

Updated by Martin Ilavsky almost 9 years ago

Thanks, that helped. To do non-debug build, NIGHTLY_OPTIONS have to be set without "D", correct? I was not using script from $SRC/tools but rather set it all in profile.

I found the idm module under /kernel/drv/misc/idm. I've replaced it with the one I've built (32b), rebooted the server and checked the module - all OK.

Both HPUX clients with the same initiator name are using the LUNs right now. I did the first round of tests - so far, so good - OI survived and is up. Will provide more info after some additional tests.

#23

Updated by Dan McDonald almost 9 years ago

For a full build, -D adds a DEBUG build to the mix. Non-DEBUG is always there with nightly. I'm assuming you copied idm out of obj32/.

If you happen to have kmem_debug still on, and when its convenient, can you "reboot -d" and take a new kernel core dump? I'm curious to see if the non-related leaks I saw are still there.

Also, I'm pleasantly surprised the two same-named LUNs are actually working. I hope this experimental fix isn't just papering over the problem. I'm still concerned there's something darker hiding, but I may just be paranoid.

#24

Updated by Martin Ilavsky almost 9 years ago

I used "-FnClmprt" flags. Yes, obj32/ was used. I turned kmem flags on before doing actual tests.

Well your gut feeling is right - there is a problem. After ~15mins after last ioscan system hung. Kernel responds to ping but I can't login. This is the old HW with no serial console. I can't go to work tonight, 'll have to do it tomorrow.

#25

Updated by Dan McDonald almost 9 years ago

Martin Ilavsky wrote:

I used "-FnClmprt" flags. Yes, obj32/ was used. I turned kmem flags on before doing actual tests.

Well your gut feeling is right - there is a problem. After ~15mins after last ioscan system hung. Kernel responds to ping but I can't login. This is the old HW with no serial console. I can't go to work tonight, 'll have to do it tomorrow.

If you happen to be able to login on the console, doing "reboot -d" would be really helpful again. Damn, I knew it was too good to be true.

#26

Updated by Martin Ilavsky almost 9 years ago

Dan McDonald wrote:

If you happen to be able to login on the console, doing "reboot -d" would be really helpful again. Damn, I knew it was too good to be true.

I was not able to login on the console (hung output), had to reboot it manually - unfortunately no core dump. Currently system is up, no LUNs are presented to HPUX yet.

Is there a way to dump running illumos kernel with a keyboard magic key or something? I'm not sure how to obtain dump when system is in this state (no TOC from console possible, etc.).

One of those weird solutions would be to schedule reboot -d in cron hour after system is up.

#27

Updated by Martin Ilavsky almost 9 years ago

..right, if I can't even spawn a login shell, hardly I would be able to spawn reboot (something I realized later, but I can't update my replies).
Actually, with this this idm module system got frozen after 3 days even when no initiators tried to login.

Is there a way to setup a watchdog that would reboot this system when it gets hung?

#28

Updated by Dan McDonald almost 9 years ago

Martin Ilavsky wrote:

Is there a way to setup a watchdog that would reboot this system when it gets hung?

The only way I know of assumes you have direct console access and will slow down performance.

If you want to try it, you can boot with kmdb. When you hit grub, add the "-k" flag to the "kernel$ /platform/i86pc..../unix ..." command line using edit. If you have kmdb loaded, you can press F1 and 'a' simultaneously, break into the kernel debugger, and force a system dump by typing '$<systemdump' at the kmdb prompt.

#29

Updated by Dan McDonald almost 9 years ago

I have a DTrace script to try after you reset the system to its default idm module. You'll need to run it BEFORE the two clients login, which may mean using boot-time DTrace. If you can delay the two clients, though, here's the script:

#!/usr/sbin/dtrace -FCs

login_sm_session_bind:entry
{
    self->trace = 1;
    self->ict = arg0;
}

iscsit_sess_lookup_conn:entry
/self->trace == 1/
{
    printf("Hmmm, found existing session, lookup up conn on ict_cid\n");
    stack();
}

iscsit_sess_lookup_conn:return
/self->trace == 1/
{
    printf("existing_ict is 0x%p, new one is 0x%p\n", arg1, self->ict);
}

login_sm_session_bind:return
/self->trace == 1/
{
    self->trace = 0;
}

After poring over the code, I think the case of seeing an existing session with login_sm_session_bind may be setting an event in motion that requires an extra reference being held. If one the two pointers I print matches the freed-before-used idm_conn_t, I think I can attempt a better bugfix.

Thanks for your patience, and sorry I cannot reproduce this myself.

#30

Updated by Martin Ilavsky almost 9 years ago

Thanks, that indeed helped.

Dump is available here: http://ilavsky.name/oi/vmdump.6

Kmem flags are set as required, custom idm module was loaded, two hpux initiators with the same name were logging in. System hung, I did crash it with F1+a/systemdump approach. Panic string:

         message BAD TRAP: type=e (#pf Page fault) rp=d74dd8fc addr=0 occurred in module "<unknown>" due to a NULL pointer dereference

I noticed that my custom idm module is ~ twice the size the original one:

# ll ~/idm/
-rwxr-xr-x   1 root     root      207856 Dec 16 22:27 new-idm-driver
-rwxr-xr-x   1 root     sys       100796 Sep 12 13:47 original-idm-driver

..which makes me think if I did compile it correctly (with non-debug).

#31

Updated by Dan McDonald almost 9 years ago

The double-sized probably isn't code. Try this if you don't believe me:

dis orig-idm | wc -l
dis new-idm | wc -l

you shouldn't see much of a delta between the two (should be less than 100).

I'm d-loading the bits now. If it's a hard hang, we probably have a deadlock of some sort, and the core will reveal that.

BTW, if your machine panics before the DTrace script prints, that's okay. The coredump will have all of the DTrace goodies in it.

#32

Updated by Martin Ilavsky almost 9 years ago

I see .. delta is ~150 lines out of ~17k.

So the scenario now:
Fresh OI boot, original idm module, flags set, dtrace script started. HPUXes were down at the time OI was rebooted and were powered on after OI was up.
System crashed and went to kmdb prompt. I was not sure if the dump was saved at that time, so I did enter $<systemdump from its cmd.

Core file: http://ilavsky.name/oi/vmdump.7
Dtrace output: http://ilavsky.name/oi/dtrace-log

Note: there is one solaris 10 using this OI too (its name is set properly though, different LUs/access policy). Not sure if this is relevant, but just in case.

#33

Updated by Dan McDonald almost 9 years ago

Thank you. I wasn't expecting that much output. Also, I printed the wrong field vs. what actually crashed.

Can you modify this part of the D script?

iscsit_sess_lookup_conn:return
/self->trace == 1/
{
    printf("existing_ict is 0x%p, new one is 0x%p\n", arg1, self->ict);
    printf("existing ic is 0x%p, new ic is 0x%p\n",
      ((iscsi_conn_t *)arg1)->ict_ic, ((iscsi_conn_t *)self->ict)->ict_ic);
}

I need to see the idm_conn_t (ic) more than the iscsi_conn_t (ict).

Again, thanks for your patience.

#34

Updated by Dan McDonald almost 9 years ago

I've a new theory, and a new D script to match:


#!/usr/sbin/dtrace -FCs

/* Let's see if we can find where the refhold is being dropped? */

/* New theory:  Switch into state CS_S3_XPT_UP, but with a failure. */

#define CS_S3_XPT_UP 3

idm_update_state:entry
/arg1 == CS_S3_XPT_UP/
{
    self->trace = 1;
    self->ic = arg0;
}

idm_update_state:return
{
    self->trace = 0;
}

#define IDM_STATUS_REJECT 3
idm_tgt_conn_finish:return
/self->trace == 1 && arg1 == IDM_STATUS_REJECT/
{
    printf("Hmmm, transition to S3_XPT_UP, but with a failure, ic 0x%p\n",
       self->ic);
}

I'm thinking your two-identical-clients is exercising untested code here.

#35

Updated by Rich Lowe almost 9 years ago

The ways to obtain a dump from a hung system are numerous, but fragile. The most reliable is

set pcplusmp:apic_panic_on_nmi=1

in /etc/system, which does exactly what it says, and will panic if we receive an NMI. A lot of service processors have an option to generate an NMI. (note that without this option, we just reset on NMI).

You can also boot with -k, and press F1-A on the console keyboard to get into the debugger, then type $<systemdump

You can also add

set snooping=1

to /etc/system, which will enable a deadman timer which will panic if the clock thread doesn't run at least once per minute on each cpu.

#36

Updated by Martin Ilavsky almost 9 years ago

Modified dtrace script is not workig, I got an error:

dtrace: failed to compile script ./dtrace-idm: line 21: ict_ic is not a member of struct iscsi_conn

Where line 21 is:
    21        ((iscsi_conn_t *)arg1)->ict_ic, ((iscsi_conn_t *)self->ict)->ict_ic);

The second script (the new theory one) cought nothing, the only entry in log is:

dtrace: script './dtrace-idm' matched 3 probes

Crash dump: http://ilavsky.name/oi/vmdump.9

Dan McDonald wrote:

Thanks for your patience, and sorry I cannot reproduce this myself.

Np. I'm glad I can help.

Richard Reingruber Lowe: Thanks a lot for those hints.

To make things little bit easier for me (and to possibly work on this issue even when I'm not at work), I've enabled serial console and I was able to get to escape to kmdb through there. I'm logging via HPUX, one which is used in this scenario. I have a problem booting HPUX (when rebooted), because OI starts to flood it with "something" (sorry, didn't get the output from the console) and it interrupts the HPUX booting process (either in EFI or HPUX boot loader) . It seems like this port is set to work in two-way direction. I yet have to figure out how to disable this.

#37

Updated by Dan McDonald almost 9 years ago

My bad on the modification. It's supposed to be iscsi*t*_conn_t:

iscsit_sess_lookup_conn:return
/self->trace == 1/
{
    printf("existing_ict is 0x%p, new one is 0x%p\n", arg1, self->ict);
    printf("existing ic is 0x%p, new ic is 0x%p\n",
      ((iscsit_conn_t *)arg1)->ict_ic, ((iscsit_conn_t *)self->ict)->ict_ic);
}

I'm a little disappointed about the second one not catching anything. I may rewhack it to depend less on return probes (which can sometimes not return if the function has tail calls).

#38

Updated by Dan McDonald almost 9 years ago

Dan McDonald wrote:

I'm a little disappointed about the second one not catching anything. I may rewhack it to depend less on return probes (which can sometimes not return if the function has tail calls).

And here it is. It'll be more verbose, but there are little to no dependencies on return probes.

#!/usr/sbin/dtrace -FCs

/* Let's see if we can find where the refhold is being dropped? */

/* New theory:  Switch into state CS_S3_XPT_UP, but with a failure. */

#define CS_S3_XPT_UP 3

idm_update_state:entry
/arg1 == CS_S3_XPT_UP/
{
    self->trace = 1;
    self->ic = arg0;
}

idm_update_state:return
/self->trace == 1/
{
    printf("Shouldn't reach here.\n");
    self->trace = 0;
}

idm_conn_event:entry
/self->trace == 1/
{
    printf("Calling idm_conn_event(%d)\n", arg1);
}

timeout:entry
/self->trace == 1/
{
    printf("Scheduling a login timeout for 0x%x, existing timeout 0x%x,\n",
        arg1, ((idm_conn_t *)arg1)->ic_state_timeout);
    printf("  refcnt == %d\n", ((idm_conn_t *)arg1)->ic_refcnt.ir_refcnt);
    self->trace = 0;
}

#39

Updated by Martin Ilavsky almost 9 years ago

Now the "iscsit_sess" script worked. I had to crash it from kmdb, system was hung.

Crash: http://ilavsky.name/oi/vmdump.10
Dtrace log: : http://ilavsky.name/oi/dtrace-log-lookup_conn

I was not able to execute the last script, got an error:

dtrace: failed to compile script ./dtrace-idm: line 32: printf( ) argument #3 is incompatible with conversion #2 prototype:
        conversion: %x
         prototype: char, short, int, long, or long long
          argument: timeout_id_t
#40

Updated by Dan McDonald almost 9 years ago

Change both %x's in that line to %p.

#41

Updated by Martin Ilavsky almost 9 years ago

Dan McDonald wrote:

Change both %x's in that line to %p.

Done; Here's:

Crash dump: http://ilavsky.name/oi/vmdump.11
Dtrace log: http://ilavsky.name/oi/dtrace-log-verbose

#42

Updated by Dan McDonald almost 9 years ago

Sorry for the delay.

The script was surprisingly unhelpful. The crash is clearly caused by the login timeout handler receiving a FREED idm_conn_t, but the script I proposed should've caught the only case where this can happen. Clearly I'm missing something else.

IDM's too-clever architecture doesn't help debugging this problem any easier, alas.

If you're still game, here's a modified, less verbose, version of the above:

#!/usr/sbin/dtrace -Cs

/* Let's see if we can find where the refhold is being dropped? */

/* New theory:  Switch into state CS_S3_XPT_UP, but with a failure. */

#define CS_S3_XPT_UP 3

idm_update_state:entry
/arg1 == CS_S3_XPT_UP/
{
    self->trace = 1;
    self->ic = arg0;
}

idm_conn_event:entry
/self->trace == 1/
{
    printf("Calling idm_conn_event(%d)\n", arg1);
}

timeout:entry
/self->trace == 1/
{
    self->trace = 0;
}

idm_update_state:return
/self->trace == 1/
{
    printf("Shouldn't reach here.\n");
    self->trace = 0;
}

#43

Updated by Martin Ilavsky almost 9 years ago

np; sure I'm still in game.

All servers were rebooted prior to the test. I used your last dtrace script above. Same scenario: two HPUX nodes with the same initiator name. Crash occurred nearly immediately, unfortunately script didn't catch anything (only output was it matched 4 probes).

Crash: http://ilavsky.name/oi/vmdump.12

> $C
d5426bb8 vpanic(fe8f5074, fe8f505c, e1327cbc, deadbee8, d5426da0, 0)
d5426be8 mutex_panic+0x5e(fe8f505c, e1327cbc, d5426c6c, f838b1ac)
d5426c48 mutex_vector_enter+0x43d(e1327cbc, d5426da0, f838b1ac, f838ab95)
d5426c68 idm_conn_event+0x19(e1327000, 12, 0, fe86b2e1)
d5426c88 idm_login_timeout+0x12(e1327000)
d5426cc8 callout_list_expire+0x68(d1ed8240, d9adcec0, fcc3ecf5, 82922)
d5426ce8 callout_expire+0x1b(d1ed8240)
d5426d08 callout_execute+0x19(d1ed8240, d4ea9b18)
d5426d88 taskq_thread+0x220(d5048e20, 0)
d5426d98 thread_start+8()
#44

Updated by Dan McDonald almost 9 years ago

Martin Ilavsky wrote:

Crash occurred nearly immediately, unfortunately script didn't catch anything (only output was it matched 4 probes).

Shoot. This means my theory of failing during this state transition is wrong. (I was hoping I hadn't seen it due to the verbosity.) This means it's back to the drawing board.

One thing: What is the duplicate name, precisely? If there's another existing IDM instance in the coredump, maybe there's something there that can help me figure out what broke.

#45

Updated by Martin Ilavsky almost 9 years ago

Dan McDonald wrote:

Martin Ilavsky wrote:

Crash occurred nearly immediately, unfortunately script didn't catch anything (only output was it matched 4 probes).

Shoot. This means my theory of failing during this state transition is wrong. (I was hoping I hadn't seen it due to the verbosity.) This means it's back to the drawing board.

One thing: What is the duplicate name, precisely? If there's another existing IDM instance in the coredump, maybe there's something there that can help me figure out what broke.

It's the iSCSI initiator name. Output from HPUX node:

#iscsiutil -l
Initiator Name            : iqn.2011-12.com.hp.svk:kapor   
Initiator Alias           :

Authentication Method     :
CHAP Method               : CHAP_UNI
Initiator CHAP Name       :
CHAP Secret               :
NAS Hostname              :
NAS Secret                :
Radius Server Hostname    :
Header Digest             : None,CRC32C (default)
Data Digest               : None,CRC32C (default)
SLP Scope list for iSLPD  :
#

This name is not changed during golden image recovery (similar to flash archives). Both nodes are using this name.

It was set to "iqn.2011-12.com.hp.svk:kapor" on both nodes during last crash (vmdump.12). Before, both nodes were using "iqn.2011-12.com.hp.svk:lamer" name. These are the nodes of the HP cluster ( hostnames: kapor and lamer).

#46

Updated by Martin Ilavsky over 8 years ago

So far I was able to keep initiator names different among iSCSI clients, but OI does crash from time to time with the following message:

[1]> $C
d54f5ed0 kmdb_enter+0xa()
d54f5ee0 debug_enter+0x29(0, c, d311ea94, 80)
d54f5f00 abort_sequence_enter+0x4d(0, 3f8, 84ca1, fe927b4e)
d54f5f40 asy`async_rxint+0x20e(d311ea60, f9, 0, 0)
d54f5f80 asy`asyintr+0xcf(d311ea60, 0, d311ea60, 0)
d54f5fd0 av_dispatch_autovect+0x62(4, e9fc1330, feedface, feedface)
d54f5ff0 dispatch_hilevel+0x12(4, 0)
d54bdc9c switch_sp_and_call+0xf(d54f6000, fe827288, 4, 0)
d54bdcdc do_interrupt+0xce(d54bdcec, d2da8100)
d54bdcec _interrupt+0x5a()
d54bdd38 i86_mwait+0xc(0, 0, fe836324, d2da8100)
d54bdd78 cpu_idle_mwait+0xd9()
d54bdd88 idle+0xed(0, 0)
d54bdd98 thread_start+8()
[1]>

This seems to be unrelated to the issue in the thread, but I'm just starting to learn how to debug and still getting familiar with Illumos, therefore I did paste it here. just in case ..

I did upload the dump: http://ilavsky.name/oi/vmdump.13

Also available in: Atom PDF