Project

General

Profile

Bug #10535

zfs send gives up too easily on progress reporting

Added by Joshua Clulow 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
2019-03-11
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:

Description

On an extremely busy system, we witnessed a zfs send process that did not appear to output ongoing send progress, even though the -v and -P flags were used.

Looking at the ongoing send:

[root@HA7SBLND2 (ap-southeast-1a) ~]# pstack 92809
92809:  /usr/sbin/zfs send -v -P zones/e964f422-c4ea-47f1-b4d6-41e99004e1e7/da
-----------------  lwp# 1 / thread# 1  --------------------
 feec3885 ioctl    (813e548, 5a1c, 8044b98, fed9c3df) + 15
 fed9c576 dump_ioctl (8142548, 80473b8, 0, 0, 0, 1) + 1a8
 fed9df3d dump_snapshot (8142548, 80473b0, 1, 4, fedacfa3, 8045e4c) + 3c4
 fed8bba0 zfs_iter_snapshots_sorted (8142cc8, fed9db79, 80473b0, fedb0000, fedacfa3, 813e548) + 65
 fed9ca14 dump_filesystem (8142cc8, 80473b0, 8047098, fed9cb53, 814d9e8, 0) + 180
 fed9cb72 dump_filesystems (8142cc8, 80473b0, 3eef15f8, 522, 8c9185, 0) + 2d
 fed9d8fc zfs_send (8142cc8, 0, 8047e59, 8047c40, 1, 0) + 917
 0805baa6 zfs_do_send (4, 8047d18, 80778e0, 801, 0, 3) + 72f
 0805f2b2 main     (8047ccc, fef3f348, 8047d08, 8055cbf, 5, 8047d14) + 22c
 08055cbf _start   (5, 8047e08, 8047e16, 8047e1b, 8047e1e, 8047e21) + 83
-----------------  lwp# 2 / thread# 2  --------------------
 fed9dfa6 send_progress_thread(), exit value = 0xffffffff
        ** zombie (exited, not detached, not yet joined) **

A cursory inspection of send_progress_thread() suggests that any failure of the progress reporting ioctl() will result in the progress thread exiting (with no diagnostic message!) and no more progress reports, even though the send continues unabated:

static void * 
send_progress_thread(void *arg)
{
...
        if (!pa->pa_parsable)
                (void) fprintf(stderr, "TIME        SENT   SNAPSHOT\n");

        /*
         * Print the progress from ZFS_IOC_SEND_PROGRESS every second.
         */
        for (;;) {
                (void) sleep(1);

                zc.zc_cookie = pa->pa_fd;
                if (zfs_ioctl(hdl, ZFS_IOC_SEND_PROGRESS, &zc) != 0)
                        return ((void *)-1);
...

We should arrange for at least a slightly more sophisticated mechanism where the progress reporting thread will continue to operate, even if some ioctl() calls fail. The thread should instead coordinate with the main thread to exit at the correct moment.

Also available in: Atom PDF