svc.startd core dump
Tracked down that this was due to svc.startd dumping core and restarting.
f5af9da8 libc_hwcap1.so.1`_lwp_kill+7(148, 6, 0, fee7dad2, feba0000, feb9f654)
f5af9dc8 libc_hwcap1.so.1`raise+0x22(6, feb9f654, f5af9de8, feb93e85)
f5af9de8 libumem.so.1`umem_do_abort+0x2b(feba0000, 0, f5af9e58, feb97550,
f5af9df8 libumem.so.1`umem_err_recoverable+0x5a(feb9f654, 8274c38, 0, 809c7e8,
f5af9e58 libumem.so.1`umem_error+0x484(8274c38, 80920a4, 0, feb99e96)
f5af9e98 libumem.so.1`umem_slab_free+0x115(80ae960, 809c900, 809c8d0, 809c8e8, 0
f5af9ec8 libumem.so.1`umem_cache_free+0x1c2(809c790, 8274c38, f5af9f08, feb9ade5
f5af9f18 libumem.so.1`umem_free+0x10b(8274c38, 1a, 0, 82a6b80)
f5af9f38 startd_free+0x11(8274c38, 1a, 8274c80, 0)
f5af9f88 restarter_delete_inst+0x2fe(fef4eac0, fedf2000, 208, 8160a88, fef4eac0
f5af9fc8 restarter_process_events+0x110(80aca30, 0, f5af9fe8, feed66f9)
f5af9ff8 libc_hwcap1.so.1`_lwp_start(f96f7a40, 0, 0, 0, 0, 0)
There is nothing related to this core dump in either the /etc/svc/volatile or /var/svc/log copies of the log files for the startd. Looking at the first arg to umem_err_recoverable, I see:
0xfeb9f654: umem: heap corruption detected
This always happens when we are installing the new agents (and that in turn is trying to remove/cleanup the old agents). Looking at the arg to restarter_process_events, thats an FMRI string.
and we're going down through restarter_delete_inst via RESTARTER_EVENT_TYPE_REMOVE_INSTANCE. We're passing the following string into startd_free:
0x8274c38: VM Metadata Daemon (node)
I can reproduce this by running the following two commands (started core dumps after 2nd command):
- svcadm disable -s metadata
- svccfg delete metadata
We always core freeing the same string.
I have root caused this to the changes in OS-1186. We are now calling safe_strdup to allocate shorter strings, but that function just calls strdup. When we go to free the strings we are still calling startd_free which in turn calls umem_free. umem_free will fail as above when it is given memory that was not allocated by umem_alloc.