Bug #11306
closedsyseventd cores after hot-add of disk
100%
Description
The folks at Nexenta reported that syseventd was consistently dumping core using disk hot-plug operations. I've captured two emails to illumos-developer from Yuri Pankov below:
Begin forwarded message: From: Yuri Pankov <yuri.pankov@nexenta.com> Subject: [developer] syseventd dying with weird backtrace Date: June 18, 2019 at 10:43:32 PM PDT To: illumos-developer <developer@lists.illumos.org> Reply-To: illumos-developer <developer@lists.illumos.org> Hi, I noticed recently that syseventd is dying with weird backtrace whenever I remove/reinsert a disk from/to enclosure. Tried installing omniosce to make sure it's not some internal issue, and seeing the same, happened after pulling the disk: root@omniosce:~# file /core /core: ELF 32-bit LSB core file 80386 Version 1, from 'syseventd' root@omniosce:~# mdb /core Loading modules: [ libc.so.1 libumem.so.1 libtopo.so.1 libavl.so.1 libcmdutils.so.1 libsysevent.so.1 libuutil.so.1 libnvpair.so.1 ld.so.1 ] > ::status debugging core file of syseventd (32-bit) from omniosce file: /usr/lib/sysevent/syseventd initial argv: /usr/lib/sysevent/syseventd threading model: native threads status: process terminated by SIGSEGV (Segmentation Fault), addr=fc9eaf85 > $C fc80efc8 0xfc9eaf85(8122d10, 0, 0, 0) fc80efe8 libc.so.1`_thrp_setup+0x88(fec99a40) fc80eff8 libc.so.1`_lwp_start(fec99a40, 0, 0, 0, 0, 0) > fc9eaf85::whatis fc9eaf85 is unknown > fc9eaf85::dis mdb: failed to read instruction at 0xfc9eaf85: no mapping for address I guess it's thr_create() calling the thread start function, but otherwise it seems to be lacking any details. The pool config is pretty simple (for testing purposes): pool: test state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 79K in 0h0m with 0 errors on Wed Jun 19 08:28:30 2019 config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 c2t5000CCA021275E52d0 ONLINE 0 0 0 c2t5000CCA0212B86DEd0 ONLINE 0 0 0 spare-2 DEGRADED 0 0 0 c1t5000CCA021269079d0 REMOVED 0 0 0 c1t5000CCA0212A79EDd0 ONLINE 0 0 0 c2t5000CCA0212BA18Ed0 ONLINE 0 0 0 spares c1t5000CCA0212A79EDd0 INUSE currently in use Any hints on debugging this further?
Begin forwarded message: From: Yuri Pankov <yuri.pankov@nexenta.com> Subject: Re: [developer] syseventd dying with weird backtrace Date: June 19, 2019 at 3:16:14 PM PDT To: illumos-developer <developer@lists.illumos.org> Cc: Rob Johnston <rob.johnston@joyent.com> Reply-To: illumos-developer <developer@lists.illumos.org> Reverting the following commits (found by bisect) seems to fix it for me: 9455 Expose drive speed and temperature on disk topo node 9835 ses topo module should use topo_mod_clean_str() Now to find out what's really going on here.
I don't see any obvious intersection between libtopo and syseventd. However, as the author of the fixes for 9455 and 9835, I'm happy to investigate.
I am able to reproduce this on SmartOS on one our storage servers. For me the core occurs after reinserting a disk I had previously pulled out.
As a first step, I tried running syseventd in the foreground and outside of SMF with the debugging turned way up and repeated the experiment. This produced a ton of log messages - though sadly nothing that appears helpful for this case - the tail end is simply:
syseventd[22069]: Event delivery complete for id 0X1a8 syseventd[22069]: No more events /lib/svc/method/svc-syseventd: line 32: 22069: Memory fault(coredump) WARNING: /usr/lib/sysevent/syseventd failed: exit status 267
As was mentioned in the original email thread, the useless stack trace we're getting in the syseventd process core is almost certainly the result of some errant thread smashing the stack. So my next step will be to build a debug platform image and then run syseventd with UMEM_DEFAULTS and UMEM_LOGGING cranked up in hopes that it causes syseventd to SIGABRT at the point of the actual corruption.
Updated by Rob Johnston over 4 years ago
- Subject changed from sysevend cores after hot-add of disk to syseventd cores after hot-add of disk
- Status changed from New to In Progress
Updated by Rob Johnston over 4 years ago
Reproducing the issue on debug bits with syseventd configured with all the UMEM stuff turned on showed no memory violations.
After discussing the issue with Robert Mustacchi we arrived at a working theory that what we're actually seeing is not stack corruption, but rather the aftermath of a thread attempting to execute text that has been unloaded out from underneath it. With that theory in mind I wrote up some dtrace to trace the following:
- thread creates and the address of their start functions
- dlopens and dlcloses
- calls to libzfs_fru_refresh (called by the zfs_mod syseventd plugin)
I then went through about a dozen reproductions of the issue and noticed a pattern.
- syseventd always dumps core on the second hot-remove/hot-add cycle
- the address at the top of the stack frame was always around the same address range (0xfc98XXXX)
If we look at what is in syseventd's process map in the address range that is always at the top of the stack frame after the first hot-remove/hot-add cycle it is consistently the text and data for libdiskmgt.so:
> $m ! grep libdiskmgt fc980000 fc994000 14000 /usr/lib/libdiskmgt.so.1 fc9a4000 fc9a5000 1000 /usr/lib/libdiskmgt.so.1
This is interesting because the changes in illumos#9455 introduced a dependency to the disk topo plugin on libdiskmgt. And tracing thread creates showed that libdiskmgt registers sysevent callback:
CPU FUNCTION 47 -> _thrp_create start_routine = fc98acfc 47 <- _thrp_create returned 0 libc.so.1`_thrp_create+0x4f5 libdiskmgt.so.1`walk_devtree+0x51 libdiskmgt.so.1`event_handler+0xc4 libsysevent.so.1`subscriber_event_handler+0x79 libc.so.1`_thrp_setup+0x88 libc.so.1`_lwp_start
Looking at the relevant code, the callback is libdiskmgt.so`walk_devtree(), which, in turn, spawns an aux thread, libdiskmgt.so`walker().
static void walk_devtree() { . . . if (thr_create(NULL, 0, (void *(*)(void *))walker, NULL, THR_DAEMON, &walk_thread) == 0) { walker_state = WALK_WAITING; }
So the obvious question becomes - is libdiskmgt.so`walker the routine that's at the top of the stack when syseventd core dumps? To check, I looked at the symbol table of syseventd after the first hot-remove/hot-add cycle:
> ::nm ! grep \|walker$ 0xfc98acfc|0x000000a0|FUNC |LOCL |0x0 |15 |walker
I then performed the second hot-remove/hot-add cycle and syseventd crashed:
CPU FUNCTION 47 -> _thrp_create start_routine = fc98acfc 47 <- _thrp_create returned 0 libc.so.1`_thrp_create+0x4f5 libdiskmgt.so.1`walk_devtree+0x51 libdiskmgt.so.1`event_handler+0xc4 libsysevent.so.1`subscriber_event_handler+0x79 libc.so.1`_thrp_setup+0x88 libc.so.1`_lwp_start 50 -> dlclose path: , handle: fca30ef0 50 <- dlclose returned 0 50 -> dlclose path: , handle: fca304e8 50 <- dlclose returned 0 50 -> dlclose path: , handle: fca60500 50 <- dlclose returned 0 50 -> dlclose path: , handle: fcb90f08 50 <- dlclose returned 0 50 -> dlclose path: , handle: fcbc0f38 50 <- dlclose returned 0 50 -> dlclose path: , handle: fc9e0548 50 <- dlclose returned 0 50 -> dlclose path: , handle: fcb40850 50 <- dlclose returned 0 50 -> dlclose path: , handle: fcab0d18 50 <- dlclose returned 0 50 -> dlclose path: , handle: fcae0530 50 <- dlclose returned 0 50 -> dlclose path: , handle: fcbc0500 50 <- dlclose returned 0 50 -> dlclose path: , handle: fd250d30 50 <- dlclose returned 0 14 -> dlopen path: /usr/lib/libsun_sas.so.1 52 <- dlopen returned fc9501a8 52 -> _thrp_create start_routine = fe7349a2 52 <- _thrp_create returned 0 libc.so.1`_thrp_create+0x4f5 libsysevent.so.1`sysevent_bind_subscriber_cmn+0x378 libsysevent.so.1`sysevent_bind_xsubscriber+0x14 libsysevent.so.1`sysevent_bind_handle_cmn+0x7b libsysevent.so.1`sysevent_bind_handle+0x10 0xfcb67349 0xfcb63920 0xfcb67d57 libSMHBAAPI.so.1`HBA_LoadLibrary+0x5dd libtopo.so.1`sas_enum+0x4e 52 -> dlopen path: /usr/lib/64/libsun_sas.so.1 52 <- dlopen returned 0 52 -> dlclose path: /usr/lib/libsun_sas.so.1, handle: fc9501a8 52 <- dlclose returned 0 52 -> dlopen path: //usr/lib/fm/topo/plugins/smbios.so 52 <- dlopen returned fc9500d8 52 -> dlopen path: //usr/lib/fm/topo/plugins/fac_prov_ipmi.so 52 <- dlopen returned fc950958 52 -> dlopen path: //usr/platform/i86pc/lib/fm/topo/plugins/chip.so 52 <- dlopen returned fc950f98 52 -> dlopen path: //usr/platform/i86pc/lib/fm/topo/plugins/hostbridge.so 52 <- dlopen returned fc970f48 52 -> dlopen path: //usr/platform/i86pc/lib/fm/topo/plugins/pcibus.so 52 <- dlopen returned fc9e0880 52 -> dlopen path: //usr/lib/fm/topo/plugins/nic.so 52 <- dlopen returned fca30e38 14 -> dlopen path: //usr/lib/fm/topo/plugins/ipmi.so 14 <- dlopen returned fca60da0 14 -> dlopen path: //usr/lib/fm/topo/plugins/usb.so 14 <- dlopen returned fcab0b98 14 -> dlopen path: //usr/lib/fm/topo/plugins/fac_prov_mptsas.so 14 <- dlopen returned fcab0fb8 14 -> dlopen path: //usr/lib/fm/topo/plugins/fac_prov_ahci.so 14 <- dlopen returned fcae08d0 14 -> dlopen path: //usr/lib/fm/topo/plugins/disk.so 14 <- dlopen returned fcb40988 14 -> _thrp_create start_routine = fe7349a2 14 <- _thrp_create returned 0 libc.so.1`_thrp_create+0x4f5 libsysevent.so.1`sysevent_bind_subscriber_cmn+0x378 libsysevent.so.1`sysevent_bind_xsubscriber+0x14 libsysevent.so.1`sysevent_bind_handle_cmn+0x7b libsysevent.so.1`sysevent_bind_handle+0x10 libdiskmgt.so.1`events_start_event_watcher+0x20 libdiskmgt.so.1`initialize+0x82 libdiskmgt.so.1`make_descriptors+0x19 libdiskmgt.so.1`cache_get_descriptors+0x21 libdiskmgt.so.1`drive_get_descriptor_by_name+0x54 21 <- libzfs_fru_refresh returned void libzfs.so.1`libzfs_fru_refresh+0xe3 zfs_mod.so`zfs_deliver_event+0x249 syseventd`client_deliver_event_thr+0x109 libc.so.1`_thrp_setup+0x88 libc.so.1`_lwp_start dtrace: pid 109808 has exited
I then looked at the stack trace in the resulting core:
# mdb /cores/core.syseventd.109808 Loading modules: [ libumem.so.1 libc.so.1 libsysevent.so.1 libnvpair.so.1 libtopo.so.1 libavl.so.1 libcmdutils.so.1 libuutil.so.1 ld.so.1 ] > $c 0xfc98ad1d(0, 0, 0, 0) libc.so.1`_thrp_setup+0x88(fec6a240) libc.so.1`_lwp_start(fec6a240, 0, 0, 0, 0, 0)
From looking at the symbol table just prior to the process dying, we know that libdiskmgt.so`walker was at 0xfc98acfc and has a length of 0xa0.
> 0xfc98acfc + 0x000000a0 = K fc98ad9c
0xfc98ad1d falls inside this range - so yes, the thread was attempting to execute code inside libdiskmgt`walker
Looking some more at the code in libdiskmgt and in libzfs, the sequence of events becomes clear:
The first hot-remove/hot-add cycle caused zfs_mod to manually dlopen libtopo (see comments in usr/src/lib/libzfs/common/libzfs_fru.c for why it does this). This in turn causes the disk plugin to get loaded and then libdiskmgt. libdiskmgt registers a syseventd callback that is subscribed to EC_DEV_{ADD,REMOVE} events. When zfs_mod is done it dlcloses libtopo and friends, but unfortunately, there's no code in libdiskmgmt that ever unregisters the callback. After the second hot-remove/hot-add cycle, syseventd tries to invoke the callback that libdiskmgt registered, but it's been unloaded - so KABOOM!
This looks like a day 0 bug in libdiskmgmt, but we've never hit it because the previous consumers have been CLIs (like diskinfo) that run and exit. It doesn't appear to have been designed with the idea that it might be dynamically loaded and unloaded in a long running process. But the changes in illumos#9455, by way of zfs_mod and libzfs, introduced a dependency on libdiskmgt into syseventd.
The fix is probably to add a fini routine to libdiskmgt so that it can properly cleanup it's event subscriptions and unbind from the sysevent channel when it gets unloaded.
Updated by Rob Johnston over 4 years ago
Testing¶
The fix was tested on a Joyent lab system where the bug was readily reproducible. With the fix in place, hot-add/hot-remove of disks no longer resulted in syseventd dropping code. Additionally, I used dtrace to observe the intenal state of syseventd to verify that the libdiskmgt fini routine was being called and was waiting for the sysevent callback thread (walker()) to complete before unloading the library. The dtrace output is included below.
Additionally, the fix was tested at Nexenta (who originally reported the bug) and that testing was also successful.
CPU FUNCTION 20 -> _thrp_create start_routine = fcb2afc3 libc.so.1`_thrp_create libc.so.1`thr_create+0x50 libdiskmgt.so.1`walk_devtree+0x51 libdiskmgt.so.1`event_handler+0xc4 libsysevent.so.1`subscriber_event_handler+0x79 libc.so.1`_thrp_setup+0x88 libc.so.1`_lwp_start 20 <- _thrp_create returned 0 22 -> walker walker() called 12 -> libzfs_fru_refresh libzfs_fru_refresh() called libzfs.so.1`libzfs_fru_refresh zfs_mod.so`zfs_deliver_update+0x108 zfs_mod.so`zfs_deliver_event+0x249 syseventd`client_deliver_event_thr+0x109 libc.so.1`_thrp_setup+0x88 libc.so.1`_lwp_start 12 -> dlclose path: , handle: fcbc0ef0 12 <- dlclose returned 0 12 -> dlclose path: , handle: fcbc04e8 12 <- dlclose returned 0 12 -> dlclose path: , handle: fcbf0500 12 <- dlclose returned 0 12 -> dlclose path: , handle: fccc0530 12 <- dlclose returned 0 12 -> dlclose path: , handle: fcd10550 12 <- dlclose returned 0 12 -> dlclose path: , handle: fcb70548 12 -> libdiskmgt_fini libdiskmgt_fini() called (walker_running = 1) libdiskmgt.so.1`libdiskmgt_fini libdiskmgt.so.1`_fini+0x20 ld.so.1`call_fini+0xb3 ld.so.1`remove_hdl+0x744 ld.so.1`dlclose_core+0xb8 ld.so.1`dlclose_intn+0x25 ld.so.1`dlclose_check+0x92 ld.so.1`dlclose+0x3d libtopo.so.1`rtld_fini+0x80 libtopo.so.1`topo_mod_stop+0x24 48 <- walker returned void 12 <- libdiskmgt_fini returned void 12 <- dlclose returned 0 12 -> dlclose path: , handle: fcd108e0 12 <- dlclose returned 0 12 -> dlclose path: , handle: fcc40d18 12 <- dlclose returned 0 12 -> dlclose path: , handle: fcc90630 12 <- dlclose returned 0 12 -> dlclose path: , handle: fcd70d28 12 <- dlclose returned 0 12 -> dlclose path: , handle: fd220cc0 12 <- dlclose returned 0 14 -> dlopen path: //usr/lib/fm/topo/plugins/smbios.so 0 <- dlopen returned fcac0638 0 -> dlopen path: //usr/lib/fm/topo/plugins/fac_prov_ipmi.so 0 <- dlopen returned fcb10018 0 -> dlopen path: //usr/platform/i86pc/lib/fm/topo/plugins/chip.so 0 <- dlopen returned fcb10500 0 -> dlopen path: //usr/platform/i86pc/lib/fm/topo/plugins/hostbridge.so 0 <- dlopen returned fcb70860 0 -> dlopen path: //usr/platform/i86pc/lib/fm/topo/plugins/pcibus.so 0 <- dlopen returned fcbc0298 0 -> dlopen path: //usr/lib/fm/topo/plugins/nic.so 0 <- dlopen returned fcbc0828 10 -> dlopen path: //usr/lib/fm/topo/plugins/ipmi.so 10 <- dlopen returned fcbf0d08 10 -> dlopen path: //usr/lib/fm/topo/plugins/usb.so 10 <- dlopen returned fcc40f30 10 -> dlopen path: //usr/lib/fm/topo/plugins/fac_prov_mptsas.so 10 <- dlopen returned fcc90430 10 -> dlopen path: //usr/lib/fm/topo/plugins/fac_prov_ahci.so 10 <- dlopen returned fcc90978 10 -> dlopen path: //usr/lib/fm/topo/plugins/disk.so 10 <- dlopen returned fccc0768 10 -> _thrp_create start_routine = fe7c49a2 libc.so.1`_thrp_create libc.so.1`thr_create+0x50 libsysevent.so.1`sysevent_bind_subscriber_cmn+0x378 libsysevent.so.1`sysevent_bind_xsubscriber+0x14 libsysevent.so.1`sysevent_bind_handle_cmn+0x7b libsysevent.so.1`sysevent_bind_handle+0x10 libdiskmgt.so.1`events_start_event_watcher+0x3f libdiskmgt.so.1`initialize+0x82 libdiskmgt.so.1`make_descriptors+0x19 libdiskmgt.so.1`cache_get_descriptors+0x21 10 <- _thrp_create returned 0 10 <- libzfs_fru_refresh returned void
Updated by Electric Monk over 4 years ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
git commit 5cffb2606ed604b9f1f45d061bc8aa92e0536f70
commit 5cffb2606ed604b9f1f45d061bc8aa92e0536f70 Author: Rob Johnston <rob.johnston@joyent.com> Date: 2019-07-26T16:34:02.000Z 11306 syseventd cores after hot-add of disk Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Richard Lowe <richlowe@richlowe.net>