Project

General

Profile

Actions

Bug #5596

closed

tar doesn't properly wait for its children

Added by Robert Mustacchi over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
High
Category:
cmd - userland programs
Start date:
2015-02-06
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

From SmartOS OS-3810:

This repro's for me every time:

[root@headnode (nightly-1) /var/tmp]# cp -PR /usr/vm foo
[root@headnode (nightly-1) /var/tmp]# du -sh foo
 2.7M    foo
[root@headnode (nightly-1) /var/tmp]# rm -f foo.tar.bz2 && tar cjf foo.tar.bz2 foo && (stat foo.tar.bz2 | grep Modify) && sleep 2 && (stat foo.tar.bz2 | grep Modify)
Modify: 2015-02-05 19:50:56.054611432 +0000
Modify: 2015-02-05 19:50:56.167486237 +0000
[root@headnode (nightly-1) /var/tmp]#
[root@headnode (nightly-1) /var/tmp]# rm -f foo.tar.bz2 && tar cjf foo.tar.bz2 foo && (stat foo.tar.bz2 | grep Modify) && sleep 2 && (stat foo.tar.bz2 | grep Modify)
Modify: 2015-02-05 19:51:17.623433348 +0000
Modify: 2015-02-05 19:51:17.735709880 +0000

vs gtar:

[root@headnode (nightly-1) /var/tmp]# rm -f foo.tar.bz2 && gtar cjf foo.tar.bz2 foo && (stat foo.tar.bz2 | grep Modify) && sleep 2 && (stat foo.tar.bz2 | grep Modify)
Modify: 2015-02-05 19:52:45.217087707 +0000
Modify: 2015-02-05 19:52:45.217087707 +0000

----
Showing that the size is different too:

[root@headnode (nightly-1) /var/tmp]# rm -f foo.tar.bz2 && tar cjf foo.tar.bz2 foo && (stat foo.tar.bz2 | egrep '(Size|Modify)') && sleep 2 && (stat foo.tar.bz2 | egrep '(Size|Modify)')
  Size: 262144          Blocks: 1          IO Block: 131072 regular file
Modify: 2015-02-05 20:06:13.373405262 +0000
  Size: 423163          Blocks: 1          IO Block: 131072 regular file
Modify: 2015-02-05 20:06:13.486401229 +0000

Digging into the problem, we appear to have tar exiting before its bzip2 child. If we DTrace the return value of waitpid, it always seems to get ECHILD, which seems supicious. If we look at the source code, the problem is actually a combination of a few toxic lines. The first problem that we have is a pair of missing parens in fork():

9332 /* Set the archive for writing and then compress the archive */
9333 pid_t
9334 compress_file(void)
9335 {
...
9347         if (pid = fork() > 0) {
9348                 mt = fd[1];
9349                 (void) close(fd[0]);
9350                 return (pid);
9351         }
...
9360 }

Note, because of operator precedence, we end up storing pid as 1 in the parent, which when we call waitpid causes us to get ECHILD. Of course, this leads to another very obvious problem, that we're not actually waiting for things to finish to call waitpid, but we actually call it immediately after we fork(). Because we always were trying to wait on init, we always just blew on. So if we just fix the parens, we instead deadlock forever. Our child will never terminate as it's happily waiting on stdin for something to finish. This cleans up the entire path, and we now properly wait for the child process to terminate before terminating if we're terminating successfully.

Actions

Also available in: Atom PDF