Bug #758
dtrace async timing handling issues
0%
Description
The following issue came up while writing a demo today.
Originally I tried to run a test application and let "dtrace" do some tracing what's happening in the kernel.
I tried this:/usr/sbin/dtrace -s '/dev/stdin' <<<"$dtrace_script" >'xxx' 2>&1 &
integer dt_pid=$!
Now that dtrace runs as async child process of the current shell we needed a way to detect when dtrace is ready to capture.
I tried this code:# spin-wait until dtrace becomes ready
while [[ ! -f 'xxx' || "$( < 'xxx' )" != ~(E)dtrace:\ script.+matched.+probes ]] ; do
sleep 1
printf '.'
done
printf '\n'
Here comes the issue: This only works properly for 20% of the lauches on my four-way laptop. Most of the time nothing is captured because dtrace needs multiple seconds to become ready... more than the test application runs (this problem becomes even larger when using the fbt provider which can easily match 60000+ probes).
Workaround: Add a sleep 30
to the code before the process which should be probed starts.
The next issue comes from figuring out when dtrace is done writing the log data. It turns out that if I just run /usr/bin/kill $pid
will immediately stop dtrace from writing data which may cause it to loose data which already have been captured.
The workaround was again a sleep
between the time the process being probed exists and the use of kill
.
AFAIK we need two things:
- A way to figure out when dtrace is really ready
- A way to figure out when dtrace is done writing data
AFAIK both issues could be avoided if "dtrace" would print it's status message 'dtrace: script ... matched ... probes' only after it has set-up all probes and continue writing all already captured data before exiting.
Files
Updated by Roland Mainz almost 10 years ago
- File fbt_demo001.sh.txt fbt_demo001.sh.txt added
Attached (as "fbt_demo001.sh") is the fbt demo script I used. Note the extra sections labelled with "grrr"... ;-(
Updated by Roland Mainz almost 10 years ago
More data:
It seems the trouble is the number of probes being used. The fbt provider can quickly provide 60000+ probes. If more than 5000 are enabled in one step some kind of "timing gap" > 1 second occurs between the point when dtrace claims to be ready (e.g. BEGIN { printf("dtrace_ready\n"); }
writes it's output) and the actual point when the probes become active.
For 60000 probes the gap is then >= 12 seconds (on a very large and very fast machine... which AFAIK rules out issues with paging/swapping or other resource shortage).