Project

General

Profile

Bug #758

dtrace async timing handling issues

Added by Roland Mainz over 8 years ago. Updated over 8 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2011-03-01
Due date:
% Done:

0%

Estimated time:
Difficulty:
Tags:

Description

The following issue came up while writing a demo today.

Originally I tried to run a test application and let "dtrace" do some tracing what's happening in the kernel.

I tried this:
/usr/sbin/dtrace -s '/dev/stdin' <<<"$dtrace_script" >'xxx' 2>&1 &
integer dt_pid=$!

Now that dtrace runs as async child process of the current shell we needed a way to detect when dtrace is ready to capture.
I tried this code:
# spin-wait until dtrace becomes ready
while [[ ! -f 'xxx' || "$( < 'xxx' )" != ~(E)dtrace:\ script.+matched.+probes ]] ; do
sleep 1
printf '.'
done
printf '\n'

Here comes the issue: This only works properly for 20% of the lauches on my four-way laptop. Most of the time nothing is captured because dtrace needs multiple seconds to become ready... more than the test application runs (this problem becomes even larger when using the fbt provider which can easily match 60000+ probes).

Workaround: Add a sleep 30 to the code before the process which should be probed starts.

The next issue comes from figuring out when dtrace is done writing the log data. It turns out that if I just run /usr/bin/kill $pid will immediately stop dtrace from writing data which may cause it to loose data which already have been captured.
The workaround was again a sleep between the time the process being probed exists and the use of kill.

IMO there should be better ways to do this.
AFAIK we need two things:
  1. A way to figure out when dtrace is really ready
  2. A way to figure out when dtrace is done writing data

AFAIK both issues could be avoided if "dtrace" would print it's status message 'dtrace: script ... matched ... probes' only after it has set-up all probes and continue writing all already captured data before exiting.


Files

fbt_demo001.sh.txt (1.91 KB) fbt_demo001.sh.txt Roland Mainz, 2011-03-01 02:37 PM

History

#1

Updated by Roland Mainz over 8 years ago

Attached (as "fbt_demo001.sh") is the fbt demo script I used. Note the extra sections labelled with "grrr"... ;-(

#2

Updated by Roland Mainz over 8 years ago

More data:
It seems the trouble is the number of probes being used. The fbt provider can quickly provide 60000+ probes. If more than 5000 are enabled in one step some kind of "timing gap" > 1 second occurs between the point when dtrace claims to be ready (e.g. BEGIN { printf("dtrace_ready\n"); } writes it's output) and the actual point when the probes become active.
For 60000 probes the gap is then >= 12 seconds (on a very large and very fast machine... which AFAIK rules out issues with paging/swapping or other resource shortage).

Also available in: Atom PDF