Bug #758


dtrace async timing handling issues

Added by Roland Mainz over 11 years ago. Updated over 11 years ago.

Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


The following issue came up while writing a demo today.

Originally I tried to run a test application and let "dtrace" do some tracing what's happening in the kernel.

I tried this:
/usr/sbin/dtrace -s '/dev/stdin' <<<"$dtrace_script" >'xxx' 2>&1 &
integer dt_pid=$!

Now that dtrace runs as async child process of the current shell we needed a way to detect when dtrace is ready to capture.
I tried this code:
# spin-wait until dtrace becomes ready
while [[ ! -f 'xxx' || "$( < 'xxx' )" != ~(E)dtrace:\ script.+matched.+probes ]] ; do
sleep 1
printf '.'
printf '\n'

Here comes the issue: This only works properly for 20% of the lauches on my four-way laptop. Most of the time nothing is captured because dtrace needs multiple seconds to become ready... more than the test application runs (this problem becomes even larger when using the fbt provider which can easily match 60000+ probes).

Workaround: Add a sleep 30 to the code before the process which should be probed starts.

The next issue comes from figuring out when dtrace is done writing the log data. It turns out that if I just run /usr/bin/kill $pid will immediately stop dtrace from writing data which may cause it to loose data which already have been captured.
The workaround was again a sleep between the time the process being probed exists and the use of kill.

IMO there should be better ways to do this.
AFAIK we need two things:
  1. A way to figure out when dtrace is really ready
  2. A way to figure out when dtrace is done writing data

AFAIK both issues could be avoided if "dtrace" would print it's status message 'dtrace: script ... matched ... probes' only after it has set-up all probes and continue writing all already captured data before exiting.

Files (1.91 KB) Roland Mainz, 2011-03-01 02:37 PM
Actions #1

Updated by Roland Mainz over 11 years ago

Attached (as "") is the fbt demo script I used. Note the extra sections labelled with "grrr"... ;-(

Actions #2

Updated by Roland Mainz over 11 years ago

More data:
It seems the trouble is the number of probes being used. The fbt provider can quickly provide 60000+ probes. If more than 5000 are enabled in one step some kind of "timing gap" > 1 second occurs between the point when dtrace claims to be ready (e.g. BEGIN { printf("dtrace_ready\n"); } writes it's output) and the actual point when the probes become active.
For 60000 probes the gap is then >= 12 seconds (on a very large and very fast machine... which AFAIK rules out issues with paging/swapping or other resource shortage).


Also available in: Atom PDF