Project

General

Profile

Actions

Bug #9056

closed

dtrace_unregister()/fasttrap_pid_disable()/prgetmap() deadlock

Added by Robert Mustacchi almost 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Category:
kernel
Start date:
2018-02-06
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

We had a machine that – while we were debugging a database latency issue – became unresponsive to anything interactive (though the database continued to operate). The machine was NMI'd, resulting in a dump (thoth dump cf4c95ccca13b0f1).

From the dump, there are three threads of interest. The first owns mod_lock and is blocked on dtrace_lock:


> mod_lock::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
fffffffffbcfb0e8 adapt ffffd001ea957c40      -      -     yes
> ffffd001ea957c40::findstack
stack pointer for thread ffffd001ea957c40: ffffd001ea9578b0
[ ffffd001ea9578b0 _resume_from_idle+0x112() ]
  ffffd001ea9578e0 swtch+0x141()
  ffffd001ea957980 turnstile_block+0x21a()
  ffffd001ea9579f0 mutex_vector_enter+0x3a3()
  ffffd001ea957a50 dtrace_unregister+0x58()
  ffffd001ea957ab0 fasttrap_pid_cleanup_cb+0x118()
  ffffd001ea957b00 callout_list_expire+0x98()
  ffffd001ea957b30 callout_expire+0x3b()
  ffffd001ea957b60 callout_execute+0x20()
  ffffd001ea957c20 taskq_thread+0x2d0()
  ffffd001ea957c30 thread_start+8()

The second owns dtrace_lock and is blocked on the P_PR_LOCK for proc ffffe2cdb87f6028:

> dtrace_lock::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
ffffffffc012a438 adapt ffffd06675a510c0      -      -     yes
> ffffd06675a510c0::findstack
stack pointer for thread ffffd06675a510c0: ffffd001ebda1a50
[ ffffd001ebda1a50 _resume_from_idle+0x112() ]
  ffffd001ebda1a80 swtch+0x141()
  ffffd001ebda1ac0 cv_wait+0x70()
  ffffd001ebda1af0 sprwaitlock_proc+0x34()
  ffffd001ebda1b30 sprlock_zone+0x65()
  ffffd001ebda1b60 sprlock+0x43()
  ffffd001ebda1bc0 fasttrap_pid_disable+0x37()
  ffffd001ebda1bf0 dtrace_ecb_disable+0xc9()
  ffffd001ebda1c60 dtrace_state_destroy+0xaa()
  ffffd001ebda1ca0 dtrace_close+0x77()
  ffffd001ebda1cd0 dev_close+0x31()
  ffffd001ebda1d20 device_close+0xd8()
  ffffd001ebda1db0 spec_close+0x17b()
  ffffd001ebda1e30 fop_close+0x61()
  ffffd001ebda1e70 closef+0x5e()
  ffffd001ebda1ee0 closeandsetf+0x398()
  ffffd001ebda1f00 close+0x13()
  ffffd001ebda1f10 sys_syscall+0x19f()
> ffffd06675a510c0::print kthread_t t_procp->p_user.u_psargs
t_procp->p_user.u_psargs = [ "dtrace -Cs /var/tmp/pgtxtimes.35726" ]

And the final thread (not surprisingly) owns the P_PR_LOCK for proc ffffe2cdb87f6028 and is blocked on mod_lock:

> ffffd06677e52080::findstack
stack pointer for thread ffffd06677e52080: ffffd001ed1e48f0
[ ffffd001ed1e48f0 _resume_from_idle+0x112() ]
  ffffd001ed1e4920 swtch+0x141()
  ffffd001ed1e49c0 turnstile_block+0x21a()
  ffffd001ed1e4a30 mutex_vector_enter+0x3a3()
  ffffd001ed1e4a70 mod_hold_stub+0x2b()
  ffffd001ed1e4ac0 stubs_common_code+0x1f()
  ffffd001ed1e4c00 prgetmap+0x109()
  ffffd001ed1e4c90 pr_read_map_common+0x108()
  ffffd001ed1e4cb0 pr_read_map+0x18()
  ffffd001ed1e4cf0 prread+0x5c()
  ffffd001ed1e4d90 fop_read+0xf3()
  ffffd001ed1e4f00 pread+0x1a3()
  ffffd001ed1e4f10 sys_syscall+0x19f()
> ffffd06677e52080::print kthread_t t_procp->p_user.u_psargs
t_procp->p_user.u_psargs = [ "dtrace -n buffer-read-done{@[ustack()] = count()} " ]

So we have one DTrace invocation de-instrumenting a process while another DTrace invocation is trying to read from the address space of the same process – all while an unrelated DTrace USDT provider is being unregistered.

On the one hand, all three of these threads are involved with DTrace, but on the other it's hard to see how this is purely a DTrace deadlock; DTrace very much assumes that mod_lock is sufficiently coarse-grained that mod_lock can be held when dtrace_lock is acquired.

Why is prgetmap acquiring mod_lock? It's not doing so explicitly, but rather implicitly:

> prgetmap+109::dis
prgetmap+0xda:                  leaq   -0x70(%rbp),%r15
prgetmap+0xde:                  movq   %rbx,%r12
prgetmap+0xe1:                  jmp    +0x53    <prgetmap+0x136>
prgetmap+0xe3:                  nopl   0x0(%rax,%rax)
prgetmap+0xe8:                  movq   0xfffffffffffffef0(%rbp),%rax
prgetmap+0xef:                  cmpq   $0x0,0xf0(%rax)
prgetmap+0xf7:                  je     +0x16c   <prgetmap+0x269>
prgetmap+0xfd:                  movq   (%r12),%rsi
prgetmap+0x101:                 movq   %rax,%rdi
prgetmap+0x104:                 call   +0x407a757       <unix`shmgetid>
prgetmap+0x109:                 cmpl   $-0x1,%eax       <0xffffffff>
prgetmap+0x10c:                 movl   %eax,0x60(%rbx)
prgetmap+0x10f:                 je     +0x154   <prgetmap+0x269>
prgetmap+0x115:                 cmpl   $-0x2,%eax       <0xfffffffe>
prgetmap+0x118:                 je     +0x26a   <prgetmap+0x388>
prgetmap+0x11e:                 orl    $0x200,0x58(%rbx)
prgetmap+0x125:                 movq   -0x68(%rbp),%rax
prgetmap+0x129:                 cmpq   %rax,%r13
prgetmap+0x12c:                 movq   %rax,-0x60(%rbp)
prgetmap+0x130:                 jbe    +0x14b   <prgetmap+0x281>
prgetmap+0x136:                 leaq   -0x68(%rbp),%r8

How is the call to shmgetid() acquiring mod_lock? This is because shmgetid is defined to be a WSTUB – a weak stub for an unloadable module:

/*
 * Stubs for shmem routines. shm.c
 */
#ifndef SHMSYS_MODULE
        MODULE(shmsys,sys);
        WSTUB(shmsys, shmexit,          nomod_zero);
        WSTUB(shmsys, shmfork,          nomod_zero);
        WSTUB(shmsys, shmgetid,         nomod_minus_one);
        END_MODULE(shmsys);
#endif

Thus the call to shmgetid() is winding up in mod_hold_stub:

/*
 * used by the stubs themselves to load and hold a module.
 * Returns  0 if the module is successfully held;
 *          the stub needs to call mod_release_stub().
 *          -1 if the stub should just call the err_fcn.
 * Note that this code is stretched out so that we avoid subroutine calls
 * and optimize for the most likely case.  That is, the case where the
 * module is loaded and installed and not held.  In that case we just inc
 * the mod_ref count and continue.
 */
int
mod_hold_stub(struct mod_stub_info *stub)
{
        struct modctl *mp;
        struct mod_modinfo *mip;

        mip = stub->mods_modinfo;

        mutex_enter(&mod_lock);
        ...

This, however, is (tragi-)comical, because shmsys is not, in fact, unloadable at all:


> shmsys`_fini::dis
shmsys`_fini:                   pushq  %rbp
shmsys`_fini+1:                 movl   $0x10,%eax
shmsys`_fini+6:                 movq   %rsp,%rbp
shmsys`_fini+9:                 leave  
shmsys`_fini+0xa:               ret   

(0x10 is EBUSY.) So the only purpose of this serves to acquire mod_lock in a path in which it is illegal to do so.

As to why this farce of even having a module that is tiny (7K of text and less than 1K of data), unloadable, and yet clearly required by the system at a basic level, one is referred to the Kernel Binary Interface Wars that raged within SunSoft in the mid-1990s and (especially) to Conway's Law.

The fix for this is to prevent this code path from acquiring mod_lock by changing the stubs for shmsys to what they should have been all along – instances of NO_UNLOAD_WSTUB (which does not grab mod_lock).

Actions #1

Updated by Electric Monk over 5 years ago

  • Status changed from New to Closed

git commit 49714e869ee3a25c38d504fe4ae224833811ca45

commit  49714e869ee3a25c38d504fe4ae224833811ca45
Author: Bryan Cantrill <bryan@joyent.com>
Date:   2018-07-25T05:07:44.000Z

    9056 dtrace_unregister()/fasttrap_pid_disable()/prgetmap() deadlock
    9057 unloadable modules should use NO_UNLOAD stubs
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Tim Kordas <tim.kordas@joyent.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Actions

Also available in: Atom PDF