Project

General

Profile

Bug #2910

svc.startd core dump

Added by John Sonnenschein over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Category:
cmd - userland programs
Start date:
2012-06-21
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Tracked down that this was due to svc.startd dumping core and restarting.

$C

f5af9da8 libc_hwcap1.so.1`_lwp_kill+7(148, 6, 0, fee7dad2, feba0000, feb9f654)
f5af9dc8 libc_hwcap1.so.1`raise+0x22(6, feb9f654, f5af9de8, feb93e85)
f5af9de8 libumem.so.1`umem_do_abort+0x2b(feba0000, 0, f5af9e58, feb97550,
feb9f654, 8274c38)
f5af9df8 libumem.so.1`umem_err_recoverable+0x5a(feb9f654, 8274c38, 0, 809c7e8,
809c848, 0)
f5af9e58 libumem.so.1`umem_error+0x484(8274c38, 80920a4, 0, feb99e96)
f5af9e98 libumem.so.1`umem_slab_free+0x115(80ae960, 809c900, 809c8d0, 809c8e8, 0
, 80aca30)
f5af9ec8 libumem.so.1`umem_cache_free+0x1c2(809c790, 8274c38, f5af9f08, feb9ade5
)
f5af9f18 libumem.so.1`umem_free+0x10b(8274c38, 1a, 0, 82a6b80)
f5af9f38 startd_free+0x11(8274c38, 1a, 8274c80, 0)
f5af9f88 restarter_delete_inst+0x2fe(fef4eac0, fedf2000, 208, 8160a88, fef4eac0
, 8)
f5af9fc8 restarter_process_events+0x110(80aca30, 0, f5af9fe8, feed66f9)
f5af9fe8 libc_hwcap1.so.1`_thrp_setup+0x8c(f96f7a40)
f5af9ff8 libc_hwcap1.so.1`_lwp_start(f96f7a40, 0, 0, 0, 0, 0)
There is nothing related to this core dump in either the /etc/svc/volatile or /var/svc/log copies of the log files for the startd. Looking at the first arg to umem_err_recoverable, I see:

feb9f654/S

0xfeb9f654: umem: heap corruption detected
This always happens when we are installing the new agents (and that in turn is trying to remove/cleanup the old agents). Looking at the arg to restarter_process_events, thats an FMRI string.

80aca30/S

0x80aca30: svc:/system/smartdc/metadata:default
and we're going down through restarter_delete_inst via RESTARTER_EVENT_TYPE_REMOVE_INSTANCE. We're passing the following string into startd_free:

8274c38/S

0x8274c38: VM Metadata Daemon (node)
I can reproduce this by running the following two commands (started core dumps after 2nd command):

  1. svcadm disable -s metadata
  2. svccfg delete metadata
    We always core freeing the same string.

I have root caused this to the changes in OS-1186. We are now calling safe_strdup to allocate shorter strings, but that function just calls strdup. When we go to free the strings we are still calling startd_free which in turn calls umem_free. umem_free will fail as above when it is given memory that was not allocated by umem_alloc.

History

#1

Updated by Milan Jurik over 8 years ago

  • Tags deleted (needs-triage)

I am seeing this nearly every system shutdown.

#2

Updated by Rich Lowe over 8 years ago

  • Category set to cmd - userland programs
  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Resolved in r13739 commit:f6a4c9dc9b80

Also available in: Atom PDF