Bug #14643


::dtrace dcmd can truncate output

Added by Jason King 4 months ago. Updated 3 months ago.

mdb - modular debugger
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


Attempting to retrieve dtrace data from a crash dump, the ::dtrace dcmd would return no output.
Baffled, I attempted to run a simpler dtrace script, force a panic and try to recreate a simpler setup. Upon running the same ::dtrace command again, I then saw the expected output.

At first, I suspected perhaps an uninitialized variable causing the strange behavior.
After trying to build libtrace without suppressing uninitialized variable warnings, nothing stood out.
Looking at the source, I used dtrace on mdb while running the ::dtrace dcmd:

    self->trace = 1

    self->trace = 0;

From this, I was able to observe that:
  1. dtrace_consume is called in both cases
  2. dtrace_consume returns from the same place
  3. In the 'no output' case, dt_consume_cpu is not called

Since (from the source) dt_consume_cpu is what generates the output, so this make sense.

The next question is why isn't dt_consume_cpu being called.
Since both the 'output' and 'no output' cases are returning from dtrace_consume using the same offset (the temporal option was not set), this left the following code:

        /* Retrieve data from each CPU. */
        (void) dtrace_getopt(dtp, "bufsize", &size);
        for (i = 0; i < max_ncpus; i++) {
            dtrace_bufdesc_t *buf;

            if (dt_get_buf(dtp, i, &buf) != 0)
                return (-1);
            if (buf != NULL) {
                if (first_timestamp == 0)
                    first_timestamp = buf->dtbd_timestamp;
                assert(buf->dtbd_timestamp >= first_timestamp);

                dt_pq_insert(dtp->dt_bufq, buf);
                drops[i] = buf->dtbd_drops;
                buf->dtbd_drops = 0;

        /* Consume records. */
        for (;;) {
            dtrace_bufdesc_t *buf = dt_pq_pop(dtp->dt_bufq);
            uint64_t timestamp;

            if (buf == NULL)

            timestamp = dt_buf_oldest(buf, dtp);
            if (timestamp == buf->dtbd_timestamp) {
                 * We've reached the end of the time covered
                 * by this buffer.  If this is the oldest
                 * buffer, we must do another pass
                 * to retrieve more data.
                dt_put_buf(dtp, buf);
                if (timestamp == first_timestamp &&
            assert(timestamp >= dtp->dt_last_timestamp);
            dtp->dt_last_timestamp = timestamp;

            if ((rval = dt_consume_cpu(dtp, fp,
                buf->dtbd_cpu, buf, B_TRUE, pf, rf, arg)) != 0)
                return (rval);
            dt_pq_insert(dtp->dt_bufq, buf);

Therefore, we must be hitting the timestamp==buf->dtbd_timestamp condition. How is this possible?
dtp->dt_bufq is a priority queue, ordered by the timestamps of the records in each buf.

Within each buffer, the records are sorted by timestamp (oldest first), after a record is consumed, the offset of the oldest record is incremented and re-added to the queue (as records in another buf may now be the 'oldest'). When a buf is empty or all it's entries processed, dtbd_timestamp is returned by dt_buf_oldest(). first_timestamp is the value of dtbd_timestamp from the first buf -- typically CPU 0. This is acting as a marker to signal when more bufs must be retrieved.

As noted earlier in dtrace_consume(), this is assuming that dtbd_timestamp is larger than any of the timestamps of the records in the buf. How is dtbd_timestamp being set? On a live kernel, the DTRACEIOC_BUFSNAP ioctl sets dtbd_timestamp to the value of gethrtime(), which preserves the invariant about dtbd_timestamp since it is set after the entries in buf have been written. When libdtrace is used by mdb, when it mocks DTRACEIOC_BUFSNAP, it also sets dtbd_timestamp to the value of gethrtime(). However, there is no guarantee the value of gethrtime() on the system running mdb will be larger than the values in the dtrace bufs in the crash dump! In fact, the following D script confirmed it:

#!/usr/sbin/dtrace -s
        /*print(*args[0]) */
        self->trace = 1;
        self->dtp = args[0];
        self->ts = 0;

        buf = (userland dtrace_bufdesc_t *)args[1];
        ts = (unsigned long long)buf->dtbd_timestamp;
        sz = (unsigned long long)buf->dtbd_size;
        printf("buf: %#16p  ts: %16llu size: %llu\n", buf, ts, sz);
        buf = 0;
        ts = 0;
        sz = 0;

        printf("buf: %#16p\n", args[1]);
        self->buf = (userland dtrace_bufdesc_t *)args[1];

/self->trace && self->buf != 0/
        oldest = (unsigned long long)args[1];
        ts = (unsigned long long)self->buf->dtbd_timestamp;
        printf(" ts: %16llu old: %16llu%s\n", ts, oldest,
            (oldest == ts) ? " BREAK" : "");

        oldest = 0;
        ts = 0;

        self->trace = 0;
        self->buf = 0;
        self->ts = 0;


This also explains why the ::dtrace dcmd could start working as expected when the command was re-run. Once the system running mdb has been running long enough, it will return gethrtime() values greater than in the crash dump, and things will work as expected.

Actions #1

Updated by Jason King 4 months ago

The fix for this is relatively simple:

We can read the value of panic_hrtime from the crash dump, and add the value to gethrtime() in dtracemdb_bufsnap() when setting dtbd_timestamp. As no valid probe entry in a buf can have a value greater than panic_hrtime, it guarantees that dtbd_timestamp will preserve the expected invariant.

Actions #2

Updated by Jason King 4 months ago

As a test of the fix, the following dtrace script was used:

#!/usr/sbin/dtrace -ws

        offset = (uint64_t)$1;
        printf("offset: %llu\n", (unsigned long long)offset);

        self->doit = 1

        self->doit = 0

        this->before = (unsigned long long)uregs[R_RAX];
        uregs[R_RAX] += offset;
        this->after = (unsigned long long)uregs[R_RAX];
        this->before = this->after = 0;

The invoking with -p $(pgrep mdb) <value of 'panic_hrtime/E' from dump>, the ::dtrace dcmd worked as expected. When the script was stopped, the same ::dtrace dcmd produced no output.

Actions #3

Updated by Electric Monk 4 months ago

  • Gerrit CR set to 2124
Actions #4

Updated by Jason King 3 months ago

For additional testing, a dump was created (in a VM, but it had an uptime of several weeks).
Upon the VM restarting, attempting to use the ::dtrace dcmd revealed no output.
Rebooting into a BE with the fix, the same ::dtrace dcmd showed the expected output.

Actions #5

Updated by Electric Monk 3 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 9f7719e79a5c2ccdc9ee6a14147ba723fb8c0e36

commit  9f7719e79a5c2ccdc9ee6a14147ba723fb8c0e36
Author: Jason King <>
Date:   2022-05-18T14:52:17.000Z

    14643 ::dtrace dcmd can truncate output
    Reviewed by: Gordon Ross <>
    Reviewed by: Andy Fiddaman <>
    Reviewed by: Toomas Soome <>
    Approved by: Dan McDonald <>


Also available in: Atom PDF