Project

General

Profile

Actions

Bug #3986

closed

svc.startd dies in getutxent_frec()

Added by Robert Mustacchi almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Category:
cmd - userland programs
Start date:
2013-08-04
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

We've seen a bunch of core dumps from getutxent_frec():

libc_hwcap1.so.1`fread+0x3c(fef594a0, 174, 1, 0, 805dba2, fef51000)
libc_hwcap1.so.1`getutxent_frec+0x11f(0, fc9a4a40, fc1a6f98, 805dba2, 807e7ac, fedf2000)
libc_hwcap1.so.1`getutxent+0x1d(807e7ac, fedf2000, 208, 807e7d0, fc9a4a40, 807e7d0)
fork_sulogin+0x1bb(0, 807e7d0, fc1a6fb8, 806398a, 80943ac, fef51000)
run_sulogin+0xbe(fc9a4a40, 0, fc1a6fe8, feede9ed, 0, 0)
sulogin_thread+0x71(0, 0, 0, 0)
libc_hwcap1.so.1`_thrp_setup+0x88(fc9a4a40)
libc_hwcap1.so.1`_lwp_start(fc9a4a40, 0, 0, 0, 0, 0)

The process – a child of svc.startd created in fork_sulogin() – died because fd (a static int in libc's getutx.c) was set, but fp (a static FILE * in the same file) was NULL. From the code, this is an inconsistent state, and could only be possible between a lockutx() and unlockutx() – routines that are only called from makeutx() (and always symmetrically). Now, makeutx() is called from svc.startd's utmpx_mark_init(), raising the possibility that another thread in the parent process was in the middle of this routine when the process was forked. Because fork_sulogin() (naturally) creates a child process via fork1(), no other LWP exists in the child process – which makes it difficult to determine where parent LWPs were at the time of the fork.

Fortunately, utmpx_mark_init()'s use of (MT-unsafe) makeutx() acquires a lock (utmpx_lock); if another thread were in this critical section at the time of the fork, we would expect this lock to be held. And indeed, in every dump, it does in fact seem to be owned – and not by the only remaining LWP:

> utmpx_lock::print
{
    __pthread_mutex_flags = {
        __pthread_mutex_flag1 = 0x4
        __pthread_mutex_flag2 = 0
        __pthread_mutex_ceiling = 0
        __pthread_mutex_type = 0x2
        __pthread_mutex_magic = 0x4d58
    }
    __pthread_mutex_lock = {
        __pthread_mutex_lock64 = {
            __pthread_mutex_pad = [ 0, 0, 0, 0, 0, 0, 0, 0x1 ]
        }
        __pthread_mutex_lock32 = {
            __pthread_ownerpid = 0
            __pthread_lockword = 0x1000000
        }
        __pthread_mutex_owner64 = 0x100000000000000
    }
    __pthread_mutex_data = 0xfec46a40
}
> ::walk ulwp
fec4b240

This is the smoking gun; another thread was in the utmpx critical section when we forked, leading to the consistency that induced the crash.

The fix is straightforward; a utmpx_prefork()/utmpx_postfork() should be added to svc.startd that acquire and release the lock – assuring that no thread is in one of the MT-unsafe utmpx routines at the time of the fork.

Actions #1

Updated by Robert Mustacchi almost 9 years ago

  • Status changed from New to Resolved

Resolved in 0d421f668cdfd7a53019f57234af254738038aa0.

Actions

Also available in: Atom PDF