Bug #3599

dtrace_dynvar tail calls can blow stack

Added by Rich Lowe almost 6 years ago. Updated almost 4 years ago.

Status:ClosedStart date:2013-03-01
Priority:NormalDue date:
Assignee:Bryan Cantrill% Done:


Target version:-
Difficulty:Medium Tags:


The implementation of dtrace_dynvar() assumes that the compiler will eliminate tail calls

1878    /*
1879     * The cas has failed.  Either another CPU is adding an element to
1880     * this hash chain, or another CPU is deleting an element from this
1881     * hash chain.  The simplest way to deal with both of these cases
1882     * (though not necessarily the most efficient) is to free our
1883     * allocated block and tail-call ourselves.  Note that the free is
1884     * to the dirty list and _not_ to the free list.  This is to prevent
1885     * races with allocators, above.
1886     */
1887    dvar->dtdv_hashval = DTRACE_DYNHASH_FREE;
1889    dtrace_membar_producer();
1891    do {
1892        free = dcpu->dtdsc_dirty;
1893        dvar->dtdv_next = free;
1894    } while (dtrace_casptr(&dcpu->dtdsc_dirty, free, dvar) != free);
1896    return (dtrace_dynvar(dstate, nkeys, key, dsize, op, mstate, vstate));

If the compiler does not eliminate that call (and it seems that, at least with smartos it does not), and the system is sufficiently busy, we will blow the stack

A smartos user reported a double fault:

[root@smartosn1 /var/crash/volatile]# mdb 0
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs lofs idm crypto random cpc logindmux ptm kvm sppp nsmb smbsrv nfs sata sd ipc ]
> ::status
debugging crash dump vmcore.0 (64-bit) from smartosn1
operating system: 5.11 joyent_20130125T031721Z (i86pc)
image uuid: (not set)
panic message: BAD TRAP: type=8 (#df Double fault) rp=ffffff046a429f10 addr=0
dump content: kernel pages only
> $C
ffffff001dbf3090 dtrace_dynvar+0x45(ffffff04a5798330, 3, ffffff001dbf7b40, 8, 0, ffffff001dbf7d50, ffffff04a57982f8)
ffffff001dbf3190 dtrace_dynvar+0x7c0(ffffff04a5798330, 3, ffffff001dbf7b40, 8, 0, ffffff001dbf7d50, ffffff04a57982f8)

Notice immediately the recursive call with the same arguments.

I enquired as to the output of ::stackinfo, and found it to be greatly truncated (likely a bug in itself), and so asked for t_stk and t_stkbase of the problem thread

> *panic_thread::print kthread_t t_stkbase
t_stkbase = 0xffffff001dbf3000
> *panic_thread::print kthread_t t_stk
t_stk = 0xffffff001dbf7f10

Notice how close our top frame is to t_stkbase.

The instruction at +45 is:

dtrace_dynvar+0x45:             movq   %rcx,-0xc0(%rbp)

ffffff001dbf3090 - c0 == ffffff001dbf2fd0, 48 bytes over the stack. Hence the double fault, etc.

An obvious solution is rather than tail-calling to loop (while initializing our locals within the loop, or even a straight up 'goto retry' (to preserve the somewhat perilous 80 columns)

Related issues

Related to illumos gate - Bug #3911: mdb is unable to display recursive stack Resolved 2013-07-26


#1 Updated by Bryan Cantrill almost 6 years ago

Ugh -- sorry about that. And yes, very very unlucky. I would like to take a look at the dump, if possible -- not because the tail-call optimization being broken isn't a bug (it clearly is) but just because I wonder how and what we're racing with to fail as many times as we clearly did. In terms of the fix, it looks like we can use the extant "top" label with only modest changes. I'm happy to make that change, or happy to leave that to Rich (or anyone else who wished to fix this) -- but it (obviously) must be done carefully, and I would at least like to code review the fix.

#2 Updated by Rich Lowe almost 6 years ago

For access to the dump, you want 'trentster', in #smartos.

The script they were using when they hit this was a bit silly, with a thread-specific associative array recording syscall latency.

syscall:::entry {
        self->syscall_entry_ts[probefunc] = vtimestamp;
syscall:::return {
        @time[probefunc] = lquantize((vtimestamp - self->syscall_entry_ts[probefunc]) / 1000, 0, 62, 2);
        self->syscall_entry_ts[probefunc] = 0;

#3 Updated by Rich Lowe almost 6 years ago

So, what's driving me crazy about this is that while I can see we're blowing the stack, and there's only one reason we could be, logically, I can neither:

1) Discover how we're colliding with operations for given a hash bucket with that frequency, or even close to that frequency

2) using a similar script to that being used here, collide on operations for a given hash bucket even once, never mind the 70+ times in rapid succession necessary to blow the stack.

Fixing blowing the stack is obviously worth doing, but I would vastly prefer to actually understand the rest of the bug, and I don't.

#4 Updated by Robert Mustacchi almost 4 years ago

  • Assignee changed from Rich Lowe to Bryan Cantrill
  • Subject changed from dtrace_dynvar can blow the stack to dtrace_dynvar tail calls can blow stack
  • Tags deleted (needs-triage)
  • % Done changed from 20 to 100

#5 Updated by Electric Monk almost 4 years ago

  • Status changed from New to Closed

git commit d47448f09aae3aa1a87fc450a0c44638e7ce7b51

commit  d47448f09aae3aa1a87fc450a0c44638e7ce7b51
Author: Bryan Cantrill <bryan@joyent.com>
Date:   2015-05-19T00:21:27.000Z

    3599 dtrace_dynvar tail calls can blow stack
    Reviewed by: Adam Leventhal <ahl@delphix.com>
    Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Also available in: Atom