Project

General

Profile

Bug #9058

postmortem DTrace frequently broken under vmware

Added by Robert Mustacchi over 1 year ago. Updated 9 months ago.

Status:
Closed
Priority:
Normal
Category:
DTrace
Start date:
2018-02-06
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

I don't know how often this happens outside of vmware but I hit this fairly often on my smartos VM under vmware. I am hitting this a lot right now, so I just stopped my VM (to try to get into the base state), started, then ran a test which still hits this problem. I have a 2 CPU VM and DTrace probes enabled when I panic. I am running a DEBUG build. I should have filed this long ago, but since I am hitting this a lot right now, I'm filing it for when someone (maybe me) has a chance to investigate.

> ::dtrace_state
            ADDR MINOR             PROC NAME                         FILE
ffffff035a577180     2 ffffff036bc24028 dtrace           ffffff03653941e0
> ffffff035a577180::dtrace
bash 103786 1 <- lx_io_cleanup
aio 103786 1 -> lx_io_destroy(1))
aio 103786 1 -> lx_io_destroy_common(1))
Assertion failed: timestamp >= dtp->dt_last_timestamp, file ../common/dt_consume.c, line 3014

*** mdb: received signal ABRT at:
    [1] libc.so.1`_lwp_kill+0xa()
    [2] libc.so.1`raise+0x20()
    [3] libc.so.1`abort+0x60()
    [4] libc.so.1`__assert+0x8a()
    [5] libdtrace.so.1`dtrace_consume+0x3d7()
    [6] dtrace.so`dtrace+0x2c4()
    [7] mdb`dcmd_invoke+0x7c()
    [8] mdb`mdb_call_idcmd+0x112()
    [9] mdb`mdb_call+0x3e1()
    [10] mdb`yyparse+0xdb4()
    [11] mdb`mdb_run+0x2cd()
    [12] mdb`main+0xc9d()
    [13] mdb`_start+0x6c()

mdb: (c)ore dump, (q)uit, (r)ecover, or (s)top for debugger [cqrs]? 

mdb is basically dead at this point and I need to exit and start again.

--

If I recall correctly, from quite a long time ago when I hit something like this, this is a result of our having temporal sorting of DTrace probe firings by default. That is: each entry in the buffer has a timestamp (from hrtime? the TSC?) along with the collected data. We assert that those timestamps only go up, but on VMware either they don't always increase monotonically under some conditions, or there can be large enough differences between each of the VCPUs that the delta exceeds the buffer switch rate.

--

Opening the core for postmotrtem dtracing VM results in mdb crashing at the same assert in dt_consume.c:

> ::dtrace_state
            ADDR MINOR             PROC NAME                         FILE
ffffff1d5aded0c0     2 ffffff1d5af28018 dtrace           ffffff1d5d526c90
> ffffff1d5aded0c0::dtrace
...
vmx_run:return 19 495109155683
vmx_run:return 19 495109157080
Assertion failed: timestamp >= dtp->dt_last_timestamp, file ../common/dt_consume.c, line 3016

*** mdb: received signal ABRT at:
    [1] libc.so.1`_lwp_kill+0xa()
    [2] libc.so.1`raise+0x20()
    [3] libc.so.1`abort+0x60()
    [4] libc.so.1`__assert+0x8a()
    [5] libdtrace.so.1`dtrace_consume+0x3d7()
    [6] dtrace.so`dtrace+0x2c4()
    [7] mdb`dcmd_invoke+0x7c()
    [8] mdb`mdb_call_idcmd+0x112()
    [9] mdb`mdb_call+0x3e1()
    [10] mdb`yyparse+0xdb4()
    [11] mdb`mdb_run+0x2cd()
    [12] mdb`main+0xc9d()
    [13] mdb`_start_crt+0x83()
    [14] mdb`_start+0x18()

I choose to take a core from this mdb session and then went home for the day.

Coming back to the problem later I opened the original core again in mdb and the behavior had changed – the bug was no longer reproducible! Trying again and again the core refused to kill mdb. At some point my machine panicked so I had a freshly restarted VM. Upon reboot I opened the core again in mdb and opened it for dtracing. mdb crashed. I tried again – crash. After some time the core no longer killed mdb and worked fine. I restarted my machine to test my theory and it checked out: the core could be traced just fine after the VM had been up for about 10 minutes.

Opening dt_consume.c in one tab and my mdb core in another I went to work:

Here is our assert and some context.

int
dtrace_consume(dtrace_hdl_t *dtp, FILE *fp,
    dtrace_consume_probe_f *pf, dtrace_consume_rec_f *rf, void *arg)
{

...

                        dtrace_bufdesc_t *buf = dt_pq_pop(dtp->dt_bufq);
                        uint64_t timestamp;

...

                        timestamp = dt_buf_oldest(buf, dtp);
                        assert(timestamp >= dtp->dt_last_timestamp);
                        dtp->dt_last_timestamp = timestamp;

[root@00-0c-29-58-5d-8d /zones/home]# mdb /zones/global/cores/core.mdb.3844
> ::stack
libc.so.1`_lwp_kill+0xa()
libc.so.1`raise+0x20(6)
libc.so.1`abort+0x60()
0xfffffd7fef20107a()
libdtrace.so.1`dtrace_consume+0x3d7(c5e590, fffffd7fef305f40, fffffd7fee0181f0, fffffd7fee018190, fffffd7fffdfe8d0)
dtrace.so`dtrace+0x2c4(ffffff1d5aded0c0, 1, 0, 0)
dcmd_invoke+0x7c(c49ba0, ffffff1d5aded0c0, 1, 0, 0, 0)
mdb_call_idcmd+0x112(c49ba0, ffffff1d5aded0c0, 1, 1, 518d48, 518d60)
mdb_call+0x3e1(ffffff1d5aded0c0, 1, 1)
yyparse+0xdb4()
mdb_run+0x2cd()
main+0xc9d(2, fffffd7fffdffb98, fffffd7fffdffbb0)
_start_crt+0x83()
_start+0x18()

We're interested in how timestamp is set. dt_buf_oldest takes a buf and returns a timestamp. That buf in turn is popped off a priority queue pointed to by dtp->dt_bufq. dtp is the first argument to dtrace_consume and it's address is 0xc5e590 as shown in our stacktrace.

While we're in dtrace_hdl_t we can also grab an important piece of our assert, dt_last_timestamp.

> 0xc5e590::print dtrace_hdl_t
...
dt_bufq = 0x11610d0
...
dt_last_timestamp = 0x7346ce30d8

dtp->dt_bufq contains the dtrace probe information used in postmortem dtracing, it is a priority queue (dt_pq_t) of dtrace_bufdesc_t

> 0x11610d0::print dt_pq_t
{
    dtpq_hdl = 0xc5e590
    dtpq_items = 0xc5d3d0
    dtpq_size = 0x200
    dtpq_last = 0x17
    dtpq_value = libdtrace.so.1`dt_buf_oldest
    dtpq_arg = 0
}

dtpq_items points at the array of dtrace_bufdesc_t's we're after. We can print these all out at once:

> 0xc5d3d0,0x17/Kn | ::print dtrace_bufdesc_t
{
    dtbd_size = 0xb800
    dtbd_cpu = 0x12
    dtbd_errors = 0
    dtbd_drops = 0
    dtbd_data = 0x1259010
    dtbd_oldest = 0
    dtbd_timestamp = 0x128dd61f5f
}
...

Let's grab the largest:

> 0xc5d3d0,0x17/Kn | ::print dtrace_bufdesc_t ! grep dtbd_timestamp | awk '{ print $3 }' | sort -n
0x128328dd8e
0x1284044f10
0x12853fc2d6
0x12862c4edb
0x1286de0d51
0x1287569e10
0x1287cbe830
0x128873f879
0x1288f3e0bf
0x1289342beb
0x12899a0beb
0x128a47c5b8
0x128a8340da
0x128b1d865f
0x128ba4924a
0x128c4f3316
0x128cbdbcf3
0x128d4b7b44
0x128dd61f5f
0x128ebc42ed
0x128f0255a5
0x128f7799e9

If dt_buf_oldest() is out of records it returns the time a buffer was retrieved, dtbd_timestamp. If we hit the end of a buffer and it returns dtbd_timestamp = 0x128f7799e9 we will blow our assert and crash mdb. Earlier we grabbed dt_last_timestamp = 0x7346ce30d8. dt_last_timestamp is a hrtime_t measuring the number of nanoseconds since boot. Converting to minutes we find that 0x7346ce30d8 ns is 8 minutes and 15.11 seconds, or about 10 minutes. This is the timestamp we need to pass to no longer crash mdb, explaining the mysterious behavior Patrick saw.

In the end the problem with the logic was fairly simple, we assert that timestamp is greater than or equal to the dt_last_timestamp before checking if we've hit the end of the records in a buffer. Moving the assert after the check is all that is needed to solve the issue. The problem was solved without a lot of mdb but mdb was invaluable in root causing the issue.

History

#1

Updated by Electric Monk 9 months ago

  • Status changed from New to Closed

git commit 793bd7e3617ae7d3d24e8c6b7d6befe35f07ec1f

commit  793bd7e3617ae7d3d24e8c6b7d6befe35f07ec1f
Author: Sam Gwydir <sam.gwydir@joyent.com>
Date:   2019-01-11T17:09:26.000Z

    9058 postmortem DTrace frequently broken under vmware
    Reviewed by: Tim Kordas <tim.kordas@joyent.com>
    Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
    Reviewed by: Yuri Pankov <yuripv@yuripv.net>
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Approved by: Matthew Ahrens <mahrens@delphix.com>

Also available in: Atom PDF