Project

General

Profile

Bug #11306

syseventd cores after hot-add of disk

Added by Rob Johnston over 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Hard
Tags:
Gerrit CR:

Description

The folks at Nexenta reported that syseventd was consistently dumping core using disk hot-plug operations. I've captured two emails to illumos-developer from Yuri Pankov below:

Begin forwarded message:

From: Yuri Pankov <yuri.pankov@nexenta.com>
Subject: [developer] syseventd dying with weird backtrace
Date: June 18, 2019 at 10:43:32 PM PDT
To: illumos-developer <developer@lists.illumos.org>
Reply-To: illumos-developer <developer@lists.illumos.org>

Hi,

I noticed recently that syseventd is dying with weird backtrace whenever I remove/reinsert a disk from/to enclosure.  Tried installing omniosce to make sure it's not some internal issue, and seeing the same, happened after pulling the disk:

root@omniosce:~# file /core
/core:          ELF 32-bit LSB core file 80386 Version 1, from 'syseventd'
root@omniosce:~# mdb /core
Loading modules: [ libc.so.1 libumem.so.1 libtopo.so.1 libavl.so.1 libcmdutils.so.1 libsysevent.so.1 libuutil.so.1 libnvpair.so.1 ld.so.1 ]
> ::status
debugging core file of syseventd (32-bit) from omniosce
file: /usr/lib/sysevent/syseventd
initial argv: /usr/lib/sysevent/syseventd
threading model: native threads
status: process terminated by SIGSEGV (Segmentation Fault), addr=fc9eaf85
> $C
fc80efc8 0xfc9eaf85(8122d10, 0, 0, 0)
fc80efe8 libc.so.1`_thrp_setup+0x88(fec99a40)
fc80eff8 libc.so.1`_lwp_start(fec99a40, 0, 0, 0, 0, 0)
> fc9eaf85::whatis
fc9eaf85 is unknown
> fc9eaf85::dis
mdb: failed to read instruction at 0xfc9eaf85: no mapping for address

I guess it's thr_create() calling the thread start function, but otherwise it seems to be lacking any details.

The pool config is pretty simple (for testing purposes):

 pool: test
state: DEGRADED
status: One or more devices has been removed by the administrator.
       Sufficient replicas exist for the pool to continue functioning in a
       degraded state.
action: Online the device using 'zpool online' or replace the device with
       'zpool replace'.
 scan: resilvered 79K in 0h0m with 0 errors on Wed Jun 19 08:28:30 2019
config:

       NAME                         STATE     READ WRITE CKSUM
       test                         DEGRADED     0     0     0
         raidz1-0                   DEGRADED     0     0     0
           c2t5000CCA021275E52d0    ONLINE       0     0     0
           c2t5000CCA0212B86DEd0    ONLINE       0     0     0
           spare-2                  DEGRADED     0     0     0
             c1t5000CCA021269079d0  REMOVED      0     0     0
             c1t5000CCA0212A79EDd0  ONLINE       0     0     0
           c2t5000CCA0212BA18Ed0    ONLINE       0     0     0
       spares
         c1t5000CCA0212A79EDd0      INUSE     currently in use

Any hints on debugging this further?
Begin forwarded message:

From: Yuri Pankov <yuri.pankov@nexenta.com>
Subject: Re: [developer] syseventd dying with weird backtrace
Date: June 19, 2019 at 3:16:14 PM PDT
To: illumos-developer <developer@lists.illumos.org>
Cc: Rob Johnston <rob.johnston@joyent.com>
Reply-To: illumos-developer <developer@lists.illumos.org>

Reverting the following commits (found by bisect) seems to fix it for me:

9455 Expose drive speed and temperature on disk topo node
9835 ses topo module should use topo_mod_clean_str()

Now to find out what's really going on here.

I don't see any obvious intersection between libtopo and syseventd. However, as the author of the fixes for 9455 and 9835, I'm happy to investigate.

I am able to reproduce this on SmartOS on one our storage servers. For me the core occurs after reinserting a disk I had previously pulled out.

As a first step, I tried running syseventd in the foreground and outside of SMF with the debugging turned way up and repeated the experiment. This produced a ton of log messages - though sadly nothing that appears helpful for this case - the tail end is simply:

syseventd[22069]: Event delivery complete for id 0X1a8
syseventd[22069]: No more events
/lib/svc/method/svc-syseventd: line 32: 22069: Memory fault(coredump)
WARNING: /usr/lib/sysevent/syseventd failed: exit status 267

As was mentioned in the original email thread, the useless stack trace we're getting in the syseventd process core is almost certainly the result of some errant thread smashing the stack. So my next step will be to build a debug platform image and then run syseventd with UMEM_DEFAULTS and UMEM_LOGGING cranked up in hopes that it causes syseventd to SIGABRT at the point of the actual corruption.

History

#1

Updated by Rob Johnston over 1 year ago

  • Subject changed from sysevend cores after hot-add of disk to syseventd cores after hot-add of disk
  • Status changed from New to In Progress
#2

Updated by Rob Johnston over 1 year ago

Reproducing the issue on debug bits with syseventd configured with all the UMEM stuff turned on showed no memory violations.

After discussing the issue with Robert Mustacchi we arrived at a working theory that what we're actually seeing is not stack corruption, but rather the aftermath of a thread attempting to execute text that has been unloaded out from underneath it. With that theory in mind I wrote up some dtrace to trace the following:

  • thread creates and the address of their start functions
  • dlopens and dlcloses
  • calls to libzfs_fru_refresh (called by the zfs_mod syseventd plugin)

I then went through about a dozen reproductions of the issue and noticed a pattern.

  1. syseventd always dumps core on the second hot-remove/hot-add cycle
  2. the address at the top of the stack frame was always around the same address range (0xfc98XXXX)

If we look at what is in syseventd's process map in the address range that is always at the top of the stack frame after the first hot-remove/hot-add cycle it is consistently the text and data for libdiskmgt.so:

> $m ! grep libdiskmgt
fc980000 fc994000    14000 /usr/lib/libdiskmgt.so.1
fc9a4000 fc9a5000     1000 /usr/lib/libdiskmgt.so.1

This is interesting because the changes in illumos#9455 introduced a dependency to the disk topo plugin on libdiskmgt. And tracing thread creates showed that libdiskmgt registers sysevent callback:

CPU FUNCTION                                 
 47  -> _thrp_create                          start_routine = fc98acfc

 47  <- _thrp_create                          returned 0
              libc.so.1`_thrp_create+0x4f5
              libdiskmgt.so.1`walk_devtree+0x51
              libdiskmgt.so.1`event_handler+0xc4
              libsysevent.so.1`subscriber_event_handler+0x79
              libc.so.1`_thrp_setup+0x88
              libc.so.1`_lwp_start

Looking at the relevant code, the callback is libdiskmgt.so`walk_devtree(), which, in turn, spawns an aux thread, libdiskmgt.so`walker().

static void
walk_devtree()
{
. . .
            if (thr_create(NULL, 0, (void *(*)(void *))walker, NULL,
                THR_DAEMON, &walk_thread) == 0) {
                walker_state = WALK_WAITING;
            }

So the obvious question becomes - is libdiskmgt.so`walker the routine that's at the top of the stack when syseventd core dumps? To check, I looked at the symbol table of syseventd after the first hot-remove/hot-add cycle:

> ::nm ! grep \|walker$
0xfc98acfc|0x000000a0|FUNC |LOCL |0x0  |15      |walker

I then performed the second hot-remove/hot-add cycle and syseventd crashed:

CPU FUNCTION                                 
 47  -> _thrp_create                          start_routine = fc98acfc

 47  <- _thrp_create                          returned 0
              libc.so.1`_thrp_create+0x4f5
              libdiskmgt.so.1`walk_devtree+0x51
              libdiskmgt.so.1`event_handler+0xc4
              libsysevent.so.1`subscriber_event_handler+0x79
              libc.so.1`_thrp_setup+0x88
              libc.so.1`_lwp_start

 50  -> dlclose                               path: , handle: fca30ef0
 50  <- dlclose                               returned 0
 50  -> dlclose                               path: , handle: fca304e8
 50  <- dlclose                               returned 0
 50  -> dlclose                               path: , handle: fca60500
 50  <- dlclose                               returned 0
 50  -> dlclose                               path: , handle: fcb90f08
 50  <- dlclose                               returned 0
 50  -> dlclose                               path: , handle: fcbc0f38
 50  <- dlclose                               returned 0
 50  -> dlclose                               path: , handle: fc9e0548
 50  <- dlclose                               returned 0
 50  -> dlclose                               path: , handle: fcb40850
 50  <- dlclose                               returned 0
 50  -> dlclose                               path: , handle: fcab0d18
 50  <- dlclose                               returned 0
 50  -> dlclose                               path: , handle: fcae0530
 50  <- dlclose                               returned 0
 50  -> dlclose                               path: , handle: fcbc0500
 50  <- dlclose                               returned 0
 50  -> dlclose                               path: , handle: fd250d30
 50  <- dlclose                               returned 0
 14  -> dlopen                                path: /usr/lib/libsun_sas.so.1
 52  <- dlopen                                returned fc9501a8
 52  -> _thrp_create                          start_routine = fe7349a2

 52  <- _thrp_create                          returned 0
              libc.so.1`_thrp_create+0x4f5
              libsysevent.so.1`sysevent_bind_subscriber_cmn+0x378
              libsysevent.so.1`sysevent_bind_xsubscriber+0x14
              libsysevent.so.1`sysevent_bind_handle_cmn+0x7b
              libsysevent.so.1`sysevent_bind_handle+0x10
              0xfcb67349
              0xfcb63920
              0xfcb67d57
              libSMHBAAPI.so.1`HBA_LoadLibrary+0x5dd
              libtopo.so.1`sas_enum+0x4e

 52  -> dlopen                                path: /usr/lib/64/libsun_sas.so.1
 52  <- dlopen                                returned 0
 52  -> dlclose                               path: /usr/lib/libsun_sas.so.1, handle: fc9501a8
 52  <- dlclose                               returned 0
 52  -> dlopen                                path: //usr/lib/fm/topo/plugins/smbios.so
 52  <- dlopen                                returned fc9500d8
 52  -> dlopen                                path: //usr/lib/fm/topo/plugins/fac_prov_ipmi.so
 52  <- dlopen                                returned fc950958
 52  -> dlopen                                path: //usr/platform/i86pc/lib/fm/topo/plugins/chip.so
 52  <- dlopen                                returned fc950f98
 52  -> dlopen                                path: //usr/platform/i86pc/lib/fm/topo/plugins/hostbridge.so
 52  <- dlopen                                returned fc970f48
 52  -> dlopen                                path: //usr/platform/i86pc/lib/fm/topo/plugins/pcibus.so
 52  <- dlopen                                returned fc9e0880
 52  -> dlopen                                path: //usr/lib/fm/topo/plugins/nic.so
 52  <- dlopen                                returned fca30e38
 14  -> dlopen                                path: //usr/lib/fm/topo/plugins/ipmi.so
 14  <- dlopen                                returned fca60da0
 14  -> dlopen                                path: //usr/lib/fm/topo/plugins/usb.so
 14  <- dlopen                                returned fcab0b98
 14  -> dlopen                                path: //usr/lib/fm/topo/plugins/fac_prov_mptsas.so
 14  <- dlopen                                returned fcab0fb8
 14  -> dlopen                                path: //usr/lib/fm/topo/plugins/fac_prov_ahci.so
 14  <- dlopen                                returned fcae08d0
 14  -> dlopen                                path: //usr/lib/fm/topo/plugins/disk.so
 14  <- dlopen                                returned fcb40988
 14  -> _thrp_create                          start_routine = fe7349a2

 14  <- _thrp_create                          returned 0
              libc.so.1`_thrp_create+0x4f5
              libsysevent.so.1`sysevent_bind_subscriber_cmn+0x378
              libsysevent.so.1`sysevent_bind_xsubscriber+0x14
              libsysevent.so.1`sysevent_bind_handle_cmn+0x7b
              libsysevent.so.1`sysevent_bind_handle+0x10
              libdiskmgt.so.1`events_start_event_watcher+0x20
              libdiskmgt.so.1`initialize+0x82
              libdiskmgt.so.1`make_descriptors+0x19
              libdiskmgt.so.1`cache_get_descriptors+0x21
              libdiskmgt.so.1`drive_get_descriptor_by_name+0x54

 21  <- libzfs_fru_refresh                    returned void
              libzfs.so.1`libzfs_fru_refresh+0xe3
              zfs_mod.so`zfs_deliver_event+0x249
              syseventd`client_deliver_event_thr+0x109
              libc.so.1`_thrp_setup+0x88
              libc.so.1`_lwp_start

dtrace: pid 109808 has exited

I then looked at the stack trace in the resulting core:

# mdb /cores/core.syseventd.109808
Loading modules: [ libumem.so.1 libc.so.1 libsysevent.so.1 libnvpair.so.1 libtopo.so.1 libavl.so.1 libcmdutils.so.1 libuutil.so.1 ld.so.1 ]
> $c
0xfc98ad1d(0, 0, 0, 0)
libc.so.1`_thrp_setup+0x88(fec6a240)
libc.so.1`_lwp_start(fec6a240, 0, 0, 0, 0, 0)

From looking at the symbol table just prior to the process dying, we know that libdiskmgt.so`walker was at 0xfc98acfc and has a length of 0xa0.

> 0xfc98acfc + 0x000000a0 = K
                fc98ad9c        

0xfc98ad1d falls inside this range - so yes, the thread was attempting to execute code inside libdiskmgt`walker

Looking some more at the code in libdiskmgt and in libzfs, the sequence of events becomes clear:

The first hot-remove/hot-add cycle caused zfs_mod to manually dlopen libtopo (see comments in usr/src/lib/libzfs/common/libzfs_fru.c for why it does this). This in turn causes the disk plugin to get loaded and then libdiskmgt. libdiskmgt registers a syseventd callback that is subscribed to EC_DEV_{ADD,REMOVE} events. When zfs_mod is done it dlcloses libtopo and friends, but unfortunately, there's no code in libdiskmgmt that ever unregisters the callback. After the second hot-remove/hot-add cycle, syseventd tries to invoke the callback that libdiskmgt registered, but it's been unloaded - so KABOOM!

This looks like a day 0 bug in libdiskmgmt, but we've never hit it because the previous consumers have been CLIs (like diskinfo) that run and exit. It doesn't appear to have been designed with the idea that it might be dynamically loaded and unloaded in a long running process. But the changes in illumos#9455, by way of zfs_mod and libzfs, introduced a dependency on libdiskmgt into syseventd.

The fix is probably to add a fini routine to libdiskmgt so that it can properly cleanup it's event subscriptions and unbind from the sysevent channel when it gets unloaded.

#3

Updated by Rob Johnston about 1 year ago

Testing

The fix was tested on a Joyent lab system where the bug was readily reproducible. With the fix in place, hot-add/hot-remove of disks no longer resulted in syseventd dropping code. Additionally, I used dtrace to observe the intenal state of syseventd to verify that the libdiskmgt fini routine was being called and was waiting for the sysevent callback thread (walker()) to complete before unloading the library. The dtrace output is included below.

Additionally, the fix was tested at Nexenta (who originally reported the bug) and that testing was also successful.

CPU FUNCTION                                 
 20  -> _thrp_create                          start_routine = fcb2afc3

              libc.so.1`_thrp_create
              libc.so.1`thr_create+0x50
              libdiskmgt.so.1`walk_devtree+0x51
              libdiskmgt.so.1`event_handler+0xc4
              libsysevent.so.1`subscriber_event_handler+0x79
              libc.so.1`_thrp_setup+0x88
              libc.so.1`_lwp_start

 20  <- _thrp_create                          returned 0
 22  -> walker                                walker() called

 12    -> libzfs_fru_refresh                  libzfs_fru_refresh() called

              libzfs.so.1`libzfs_fru_refresh
              zfs_mod.so`zfs_deliver_update+0x108
              zfs_mod.so`zfs_deliver_event+0x249
              syseventd`client_deliver_event_thr+0x109
              libc.so.1`_thrp_setup+0x88
              libc.so.1`_lwp_start

 12      -> dlclose                           path: , handle: fcbc0ef0
 12      <- dlclose                           returned 0
 12      -> dlclose                           path: , handle: fcbc04e8
 12      <- dlclose                           returned 0
 12      -> dlclose                           path: , handle: fcbf0500
 12      <- dlclose                           returned 0
 12      -> dlclose                           path: , handle: fccc0530
 12      <- dlclose                           returned 0
 12      -> dlclose                           path: , handle: fcd10550
 12      <- dlclose                           returned 0
 12      -> dlclose                           path: , handle: fcb70548
 12        -> libdiskmgt_fini                 libdiskmgt_fini() called (walker_running = 1)

              libdiskmgt.so.1`libdiskmgt_fini
              libdiskmgt.so.1`_fini+0x20
              ld.so.1`call_fini+0xb3
              ld.so.1`remove_hdl+0x744
              ld.so.1`dlclose_core+0xb8
              ld.so.1`dlclose_intn+0x25
              ld.so.1`dlclose_check+0x92
              ld.so.1`dlclose+0x3d
              libtopo.so.1`rtld_fini+0x80
              libtopo.so.1`topo_mod_stop+0x24

 48        <- walker                          returned void
 12      <- libdiskmgt_fini                   returned void
 12    <- dlclose                             returned 0
 12    -> dlclose                             path: , handle: fcd108e0
 12    <- dlclose                             returned 0
 12    -> dlclose                             path: , handle: fcc40d18
 12    <- dlclose                             returned 0
 12    -> dlclose                             path: , handle: fcc90630
 12    <- dlclose                             returned 0
 12    -> dlclose                             path: , handle: fcd70d28
 12    <- dlclose                             returned 0
 12    -> dlclose                             path: , handle: fd220cc0
 12    <- dlclose                             returned 0
 14    -> dlopen                              path: //usr/lib/fm/topo/plugins/smbios.so
  0    <- dlopen                              returned fcac0638
  0    -> dlopen                              path: //usr/lib/fm/topo/plugins/fac_prov_ipmi.so
  0    <- dlopen                              returned fcb10018
  0    -> dlopen                              path: //usr/platform/i86pc/lib/fm/topo/plugins/chip.so
  0    <- dlopen                              returned fcb10500
  0    -> dlopen                              path: //usr/platform/i86pc/lib/fm/topo/plugins/hostbridge.so
  0    <- dlopen                              returned fcb70860
  0    -> dlopen                              path: //usr/platform/i86pc/lib/fm/topo/plugins/pcibus.so
  0    <- dlopen                              returned fcbc0298
  0    -> dlopen                              path: //usr/lib/fm/topo/plugins/nic.so
  0    <- dlopen                              returned fcbc0828
 10    -> dlopen                              path: //usr/lib/fm/topo/plugins/ipmi.so
 10    <- dlopen                              returned fcbf0d08
 10    -> dlopen                              path: //usr/lib/fm/topo/plugins/usb.so
 10    <- dlopen                              returned fcc40f30
 10    -> dlopen                              path: //usr/lib/fm/topo/plugins/fac_prov_mptsas.so
 10    <- dlopen                              returned fcc90430
 10    -> dlopen                              path: //usr/lib/fm/topo/plugins/fac_prov_ahci.so
 10    <- dlopen                              returned fcc90978
 10    -> dlopen                              path: //usr/lib/fm/topo/plugins/disk.so
 10    <- dlopen                              returned fccc0768
 10    -> _thrp_create                        start_routine = fe7c49a2

              libc.so.1`_thrp_create
              libc.so.1`thr_create+0x50
              libsysevent.so.1`sysevent_bind_subscriber_cmn+0x378
              libsysevent.so.1`sysevent_bind_xsubscriber+0x14
              libsysevent.so.1`sysevent_bind_handle_cmn+0x7b
              libsysevent.so.1`sysevent_bind_handle+0x10
              libdiskmgt.so.1`events_start_event_watcher+0x3f
              libdiskmgt.so.1`initialize+0x82
              libdiskmgt.so.1`make_descriptors+0x19
              libdiskmgt.so.1`cache_get_descriptors+0x21

 10    <- _thrp_create                        returned 0
 10  <- libzfs_fru_refresh                    returned void
#4

Updated by Electric Monk about 1 year ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 5cffb2606ed604b9f1f45d061bc8aa92e0536f70

commit  5cffb2606ed604b9f1f45d061bc8aa92e0536f70
Author: Rob Johnston <rob.johnston@joyent.com>
Date:   2019-07-26T16:34:02.000Z

    11306 syseventd cores after hot-add of disk
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
    Reviewed by: Igor Kozhukhov <igor@dilos.org>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Also available in: Atom PDF