Bug #8574

Fix for "illumos#3821 Race in rollback, zil close, and zil flush" is incomplete

Added by Jerry Jelinek 12 days ago.

Status:NewStart date:2017-08-11
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:zfs - Zettabyte File System
Target version:-
Difficulty:Medium Tags:needs-triage

Description

In our SmartOS fork of illumos we merged the fix for illumos#3821 on Nov. 7 2016. We recently got a panic dump from one of our customers which hit the VERIFY from this change.

The zfs_dbgmsg that was logged right before the VERIFY is:

zil (ffffd0df6df4f1c0) is dirty, txg 10361846

Here is some relevant info from the dump.
> ::status
debugging crash dump 
/manta/thoth/stor/thoth/00d5048cdb91779b47ce252023febee7/vmcore.0 (64-bit) from CMHMANTAS1007
operating system: 5.11 joyent_20170608T172457Z (i86pc)
image uuid: (not set)
panic message: 
assertion failed: !zilog_is_dirty(zilog), file: ../../common/fs/zfs/zil.c, line: 1860
dump content: kernel pages only
> $C
ffffd003d9698a80 vpanic()
ffffd003d9698ab0 0xfffffffffba7bc38()
ffffd003d9698af0 zil_close+0x13c(ffffd0df6df4f1c0)
ffffd003d9698b50 zfsvfs_teardown+0x5c(ffffd0c56fe26800, 0)
ffffd003d9698b70 zfs_suspend_fs+0x13(ffffd0c56fe26800)
ffffd003d9698bd0 zfs_ioc_rollback+0x50(ffffd0d83e73b000, ffffd0c9b95e5b40, ffffd0c8f3c33880)
ffffd003d9698c70 zfsdev_ioctl+0x1a4(5a00000000, 5a19, 8046948, 100003, ffffd0c9fcd10af8, ffffd003d9698e58)
ffffd003d9698cb0 cdev_ioctl+0x39(5a00000000, 5a19, 8046948, 100003, ffffd0c9fcd10af8, ffffd003d9698e58)
ffffd003d9698d00 spec_ioctl+0x60(ffffd0c570cc2a00, 5a19, 8046948, 100003, ffffd0c9fcd10af8, ffffd003d9698e58, 0)
ffffd003d9698d90 fop_ioctl+0x55(ffffd0c570cc2a00, 5a19, 8046948, 100003, ffffd0c9fcd10af8, ffffd003d9698e58, 0)
ffffd003d9698eb0 ioctl+0x9b(6, 5a19, 8046948)
ffffd003d9698f10 _sys_sysenter_post_swapgs+0x153()
> ffffd0df6df4f1c0::print zilog_t zl_dmu_pool
zl_dmu_pool = 0xffffd0c57616f700
> ffffd0c57616f700::print dsl_pool_t dp_dirty_zilogs
dp_dirty_zilogs = {
    dp_dirty_zilogs.tl_lock = {
        _opaque = [ 0 ]
    }
    dp_dirty_zilogs.tl_offset = 0x2e8
    dp_dirty_zilogs.tl_spa = 0xffffd0c572466000
    dp_dirty_zilogs.tl_head = [ 0xffffd0df6df4f4a8, 0xffffd0da92d427e8, 0, 0 ]
}
> ffffd0df6df4f1c0+0x2e8=J
                ffffd0df6df4f4a8 
> ffffd0df6df4f4a8::print txg_node_t
{
    tn_next = [ 0xffffd0c582c429e8, 0, 0, 0 ]
    tn_member = [ 0x1, 0, 0, 0 ]
}
> ffffd0c57616f700::print dsl_pool_t dp_tx
dp_tx = {
    dp_tx.tx_cpu = 0xffffd0c577179000
    dp_tx.tx_sync_lock = {
        _opaque = [ 0 ]
    }
    dp_tx.tx_open_txg = 0x9e204d
    dp_tx.tx_quiesced_txg = 0
    dp_tx.tx_syncing_txg = 0x9e204c
    dp_tx.tx_synced_txg = 0x9e204b
    dp_tx.tx_open_time = 0x86e9f12177f01
    dp_tx.tx_sync_txg_waiting = 0x9e204c
    dp_tx.tx_quiesce_txg_waiting = 0x9e204d
    dp_tx.tx_sync_more_cv = {
        _opaque = 0
    }
    dp_tx.tx_sync_done_cv = {
        _opaque = 0x1
    }
    dp_tx.tx_quiesce_more_cv = {
        _opaque = 0x1
    }
    dp_tx.tx_quiesce_done_cv = {
        _opaque = 0
    }
    dp_tx.tx_timeout_cv = {
        _opaque = 0
    }
    dp_tx.tx_exit_cv = {
        _opaque = 0
    }
    dp_tx.tx_threads = 0x2
    dp_tx.tx_exiting = 0
    dp_tx.tx_sync_thread = 0xffffd003d69d4c40
    dp_tx.tx_quiesce_thread = 0xffffd003d5c92c40
    dp_tx.tx_commit_cb_taskq = 0
}

We can see that list_tail() would currently return NULL (see the list_empty macro), although the log messages prints a non-zero txg value (10361846).
> ffffd0df6df4f1c0::print -a zilog_t zl_lwb_list
ffffd0df6df4f310 zl_lwb_list = {
    ffffd0df6df4f310 zl_lwb_list.list_size = 0xc8
    ffffd0df6df4f318 zl_lwb_list.list_offset = 0xb8
    ffffd0df6df4f320 zl_lwb_list.list_head = {
        ffffd0df6df4f320 list_next = 0xffffd0c9cb19a578
        ffffd0df6df4f328 list_prev = 0xffffd0c9cb19a578
    }
}

Prior to integrating illumos#3821, we had been using the following diff in zil_close for 2 years as the fix for this bug.
@@ -1869,8 +1869,15 @@ zil_close(zilog_t *zilog)
        if (lwb != NULL)
                txg = lwb->lwb_max_txg;
        mutex_exit(&zilog->zl_lock);
-       if (txg)
+       if (zilog_is_dirty(zilog)) {
+               /*
+                * If we're dirty, always wait for the current transaction --
+                * our lwb_max_txg may be in the past.
+                */
+               txg_wait_synced(zilog->zl_dmu_pool, 0);
+       } else if (txg) {
                txg_wait_synced(zilog->zl_dmu_pool, txg);
+       }

It appears that this fix would still be valid in conjunction with the changes for illumos#3821, but I am not sure if there is a better way to fix this.

Also available in: Atom