Bug #5596
closedtar doesn't properly wait for its children
100%
Description
From SmartOS OS-3810:
This repro's for me every time:
[root@headnode (nightly-1) /var/tmp]# cp -PR /usr/vm foo [root@headnode (nightly-1) /var/tmp]# du -sh foo 2.7M foo [root@headnode (nightly-1) /var/tmp]# rm -f foo.tar.bz2 && tar cjf foo.tar.bz2 foo && (stat foo.tar.bz2 | grep Modify) && sleep 2 && (stat foo.tar.bz2 | grep Modify) Modify: 2015-02-05 19:50:56.054611432 +0000 Modify: 2015-02-05 19:50:56.167486237 +0000 [root@headnode (nightly-1) /var/tmp]# [root@headnode (nightly-1) /var/tmp]# rm -f foo.tar.bz2 && tar cjf foo.tar.bz2 foo && (stat foo.tar.bz2 | grep Modify) && sleep 2 && (stat foo.tar.bz2 | grep Modify) Modify: 2015-02-05 19:51:17.623433348 +0000 Modify: 2015-02-05 19:51:17.735709880 +0000
vs gtar:
[root@headnode (nightly-1) /var/tmp]# rm -f foo.tar.bz2 && gtar cjf foo.tar.bz2 foo && (stat foo.tar.bz2 | grep Modify) && sleep 2 && (stat foo.tar.bz2 | grep Modify) Modify: 2015-02-05 19:52:45.217087707 +0000 Modify: 2015-02-05 19:52:45.217087707 +0000
----
Showing that the size is different too:
[root@headnode (nightly-1) /var/tmp]# rm -f foo.tar.bz2 && tar cjf foo.tar.bz2 foo && (stat foo.tar.bz2 | egrep '(Size|Modify)') && sleep 2 && (stat foo.tar.bz2 | egrep '(Size|Modify)') Size: 262144 Blocks: 1 IO Block: 131072 regular file Modify: 2015-02-05 20:06:13.373405262 +0000 Size: 423163 Blocks: 1 IO Block: 131072 regular file Modify: 2015-02-05 20:06:13.486401229 +0000
Digging into the problem, we appear to have tar exiting before its bzip2 child. If we DTrace the return value of waitpid, it always seems to get ECHILD, which seems supicious. If we look at the source code, the problem is actually a combination of a few toxic lines. The first problem that we have is a pair of missing parens in fork():
9332 /* Set the archive for writing and then compress the archive */ 9333 pid_t 9334 compress_file(void) 9335 { ... 9347 if (pid = fork() > 0) { 9348 mt = fd[1]; 9349 (void) close(fd[0]); 9350 return (pid); 9351 } ... 9360 }
Note, because of operator precedence, we end up storing pid as 1 in the parent, which when we call waitpid causes us to get ECHILD. Of course, this leads to another very obvious problem, that we're not actually waiting for things to finish to call waitpid, but we actually call it immediately after we fork(). Because we always were trying to wait on init, we always just blew on. So if we just fix the parens, we instead deadlock forever. Our child will never terminate as it's happily waiting on stdin for something to finish. This cleans up the entire path, and we now properly wait for the child process to terminate before terminating if we're terminating successfully.
Updated by Gary Mills over 8 years ago
Your change to getresponse.c doesn't look quite right to me. Isn't getchar() supposed to return the next character as an int or EOF? The code tests for a nul character twice, and did so before this change. It's a minor point I know, but should be fixed. I'd put the `b = getchar()' statement at the top of the loop.
Updated by Dan McDonald about 8 years ago
The uncompress_file() path has the same problem starting at line 9376. This fails reproducibly by piping any .tar.gz file into "tar -xzf -".
Updated by Electric Monk about 8 years ago
- Status changed from New to Closed
git commit c536b1f93b31eba539c635a91be8ee2ab0fcb15a
commit c536b1f93b31eba539c635a91be8ee2ab0fcb15a Author: Robert Mustacchi <rm@joyent.com> Date: 2015-04-30T04:32:53.000Z 5596 tar doesn't properly wait for its children Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Approved by: Garrett D'Amore <garrett@damore.org>