Project

General

Profile

Actions

Bug #5596

closed

tar doesn't properly wait for its children

Added by Robert Mustacchi over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
High
Category:
cmd - userland programs
Start date:
2015-02-06
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

From SmartOS OS-3810:

This repro's for me every time:

[root@headnode (nightly-1) /var/tmp]# cp -PR /usr/vm foo
[root@headnode (nightly-1) /var/tmp]# du -sh foo
 2.7M    foo
[root@headnode (nightly-1) /var/tmp]# rm -f foo.tar.bz2 && tar cjf foo.tar.bz2 foo && (stat foo.tar.bz2 | grep Modify) && sleep 2 && (stat foo.tar.bz2 | grep Modify)
Modify: 2015-02-05 19:50:56.054611432 +0000
Modify: 2015-02-05 19:50:56.167486237 +0000
[root@headnode (nightly-1) /var/tmp]#
[root@headnode (nightly-1) /var/tmp]# rm -f foo.tar.bz2 && tar cjf foo.tar.bz2 foo && (stat foo.tar.bz2 | grep Modify) && sleep 2 && (stat foo.tar.bz2 | grep Modify)
Modify: 2015-02-05 19:51:17.623433348 +0000
Modify: 2015-02-05 19:51:17.735709880 +0000

vs gtar:

[root@headnode (nightly-1) /var/tmp]# rm -f foo.tar.bz2 && gtar cjf foo.tar.bz2 foo && (stat foo.tar.bz2 | grep Modify) && sleep 2 && (stat foo.tar.bz2 | grep Modify)
Modify: 2015-02-05 19:52:45.217087707 +0000
Modify: 2015-02-05 19:52:45.217087707 +0000

----
Showing that the size is different too:

[root@headnode (nightly-1) /var/tmp]# rm -f foo.tar.bz2 && tar cjf foo.tar.bz2 foo && (stat foo.tar.bz2 | egrep '(Size|Modify)') && sleep 2 && (stat foo.tar.bz2 | egrep '(Size|Modify)')
  Size: 262144          Blocks: 1          IO Block: 131072 regular file
Modify: 2015-02-05 20:06:13.373405262 +0000
  Size: 423163          Blocks: 1          IO Block: 131072 regular file
Modify: 2015-02-05 20:06:13.486401229 +0000

Digging into the problem, we appear to have tar exiting before its bzip2 child. If we DTrace the return value of waitpid, it always seems to get ECHILD, which seems supicious. If we look at the source code, the problem is actually a combination of a few toxic lines. The first problem that we have is a pair of missing parens in fork():

9332 /* Set the archive for writing and then compress the archive */
9333 pid_t
9334 compress_file(void)
9335 {
...
9347         if (pid = fork() > 0) {
9348                 mt = fd[1];
9349                 (void) close(fd[0]);
9350                 return (pid);
9351         }
...
9360 }

Note, because of operator precedence, we end up storing pid as 1 in the parent, which when we call waitpid causes us to get ECHILD. Of course, this leads to another very obvious problem, that we're not actually waiting for things to finish to call waitpid, but we actually call it immediately after we fork(). Because we always were trying to wait on init, we always just blew on. So if we just fix the parens, we instead deadlock forever. Our child will never terminate as it's happily waiting on stdin for something to finish. This cleans up the entire path, and we now properly wait for the child process to terminate before terminating if we're terminating successfully.

Actions #1

Updated by Gary Mills over 6 years ago

Your change to getresponse.c doesn't look quite right to me. Isn't getchar() supposed to return the next character as an int or EOF? The code tests for a nul character twice, and did so before this change. It's a minor point I know, but should be fixed. I'd put the `b = getchar()' statement at the top of the loop.

Actions #2

Updated by Dan McDonald over 6 years ago

The uncompress_file() path has the same problem starting at line 9376. This fails reproducibly by piping any .tar.gz file into "tar -xzf -".

Actions #3

Updated by Electric Monk over 6 years ago

  • Status changed from New to Closed

git commit c536b1f93b31eba539c635a91be8ee2ab0fcb15a

commit  c536b1f93b31eba539c635a91be8ee2ab0fcb15a
Author: Robert Mustacchi <rm@joyent.com>
Date:   2015-04-30T04:32:53.000Z

    5596 tar doesn't properly wait for its children
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Richard Lowe <richlowe@richlowe.net>
    Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
    Approved by: Garrett D'Amore <garrett@damore.org>

Actions

Also available in: Atom PDF