Project

General

Profile

Bug #3821

Race in rollback, zil close, and zil flush

Added by Robert Mustacchi about 6 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
zfs - Zettabyte File System
Start date:
2013-06-17
Due date:
% Done:

100%

Estimated time:
Difficulty:
Hard
Tags:

Description

We recently had nodes with some of the latest zfs bits panic on us in a rollback-heavy environment. The following is from my preliminary analysis:

Let's look at where we died:

> $C
ffffff01ea6b9a10 taskq_dispatch+0x3a(0, fffffffff7d20450, ffffff5551dea920, 1)
ffffff01ea6b9a60 zil_clean+0xce(ffffff4b7106c080, 7e0f1)
ffffff01ea6b9aa0 dsl_pool_sync_done+0x47(ffffff4313065680, 7e0f1)
ffffff01ea6b9b70 spa_sync+0x55f(ffffff4310c1d040, 7e0f1)
ffffff01ea6b9c20 txg_sync_thread+0x20f(ffffff4313065680)
ffffff01ea6b9c30 thread_start+8()

If we dig in we can find that this dataset corresponds to a zone:

> ffffff4b7106c080::print zilog_t zl_os->os_dsl_dataset->ds_dir->dd_myname
zl_os->os_dsl_dataset->ds_dir->dd_myname = [ "8ffce16a-13c2-4efa-a233-9e378e89877b" ]

Okay so we have a null taskq pointer. That only happens during the calls to zil_open and zil_close. If we poke around we can see that we're actually in midst of a rollback:

> ::pgrep zfs | ::printf "0x%x %s\\n" proc_t . p_user.u_psargs
0xffffff43262800a0 zfs rollback zones/15714eb6-f5ea-469f-ac6d-4b8ab06213c2@marlin_init
0xffffff54e22a1028 zfs rollback zones/8ffce16a-13c2-4efa-a233-9e378e89877b@marlin_init
0xffffff4362f3a058 zfs rollback zones/0ddb8e49-ca7e-42e1-8fdc-4ac4ba8fe9f8@marlin_init
0xffffff5748e8d020 zfs rollback zones/426357b5-832d-4430-953e-10cd45ff8e9f@marlin_init
0xffffff436b867008 zfs rollback zones/8f36bf37-8a9c-4a44-995c-6d1b2751e6f5@marlin_init
0xffffff4381ad4090 zfs rollback zones/6c8eca18-fbd6-46dd-ac24-2ed45cd0da70@marlin_init
0xffffff4370726060 zfs rollback zones/192e1461-b3a5-4b0b-8805-8ea5f2c79d7d@marlin_init
0xffffff4377170030 zfs rollback zones/42c8851f-587e-40f4-a28f-3c1a243e3622@marlin_init

That second one is our problem child. Let's see where they are.

> 0xffffff54e22a1028::walk thread | ::findstack -v
stack pointer for thread ffffff437342ac20: ffffff01f45916b0
  ffffff01f4591710 remove_reference+0x3e(ffffff01f45917d0, ffffffffc001b030, fffffffff78dd92e)
  ffffff01f4591760 0xffffff42ed2f7580()
  ffffff01f45917c0 apix_do_interrupt+0xfe(ffffff01f45917d0, ffffff4395669440)
  ffffff01f45917d0 _interrupt+0xba()
  ffffff01f4591940 arc_buf_evict+0x88(ffffff4787370198)
  ffffff01f4591980 dbuf_clear+0x8a(ffffff5154292b50)
  ffffff01f4591ab0 dnode_evict_dbufs+0x86(ffffff54f44a45d8)
  ffffff01f4591af0 dmu_objset_evict_dbufs+0xb6(ffffff438af0b640)
  ffffff01f4591b50 zfsvfs_teardown+0x138(ffffff4317ad9940, 0)
  ffffff01f4591b90 zfs_suspend_fs+0x1a(ffffff4317ad9940)
  ffffff01f4591bd0 zfs_ioc_rollback+0x2d(ffffff46d2b89000)
  ffffff01f4591c80 zfsdev_ioctl+0x4a7(5a00000000, 5a19, 8046a28, 100003, ffffff4376f029a0, ffffff01f4591e68)
  ffffff01f4591cc0 cdev_ioctl+0x39(5a00000000, 5a19, 8046a28, 100003, ffffff4376f029a0, ffffff01f4591e68)
  ffffff01f4591d10 spec_ioctl+0x60(ffffff4310796040, 5a19, 8046a28, 100003, ffffff4376f029a0, ffffff01f4591e68, 0)
  ffffff01f4591da0 fop_ioctl+0x55(ffffff4310796040, 5a19, 8046a28, 100003, ffffff4376f029a0, ffffff01f4591e68, 0)
  ffffff01f4591ec0 ioctl+0x9b(3, 5a19, 8046a28)
  ffffff01f4591f10 _sys_sysenter_post_swapgs+0x149()
> ffffff4317ad9940::print zfsvfs_t z_log
z_log = 0

Well... looks like we are in a bad spot. We're tearing down this zfsvfs_t. We've gotten past the point that we can clean up the zil, and in fact we've probably already synced it out, but yet we could still find it and tried to operate on it after its been closed, ugh.

So, we got this zil to clean from dsl_pool_sync_done. It has a list of dirty zilogs. This raises the question, did the zil_commit() call in zil_close() miss something, eg. does the dsl still think we have dirty data.

> ffffff4b7106c080::print zilog_t zl_itxg[1]
zl_itxg[1] = {
    zl_itxg[1].itxg_lock = {
        _opaque = [ 0 ]
    }
    zl_itxg[1].itxg_txg = 0
    zl_itxg[1].itxg_sod = 0
    zl_itxg[1].itxg_itxs = 0
}

Hmm, so with that zil_clean should have returned... Though maybe if we're racing somewhere this could have happened. So let's also look at whether or not we think this objset is dirty. There are two lists that we care about, the os_dirty_nodes and os_free_dnodes:

> ffffff4b7106c080::print -at zilog_t zl_os->os_dirty_dnodes[1]
ffffff438af0b7f0 list_t zl_os->os_dirty_dnodes[1] = {
    ffffff438af0b7f0 size_t zl_os->os_dirty_dnodes[1].list_size = 0x2e8
    ffffff438af0b7f8 size_t zl_os->os_dirty_dnodes[1].list_offset = 0xa0
    ffffff438af0b800 struct list_node zl_os->os_dirty_dnodes[1].list_head = {
        ffffff438af0b800 struct list_node *list_next = 0xffffff438af0b800
        ffffff438af0b808 struct list_node *list_prev = 0xffffff438af0b800
    }
}
> ffffff4b7106c080::print -at zilog_t zl_os->os_free_dnodes[1]
ffffff438af0b870 list_t zl_os->os_free_dnodes[1] = {
    ffffff438af0b870 size_t zl_os->os_free_dnodes[1].list_size = 0x2e8
    ffffff438af0b878 size_t zl_os->os_free_dnodes[1].list_offset = 0xa0
    ffffff438af0b880 struct list_node zl_os->os_free_dnodes[1].list_head = {
        ffffff438af0b880 struct list_node *list_next = 0xffffff438af0b880
        ffffff438af0b888 struct list_node *list_prev = 0xffffff438af0b880
    }
}

Both of these lists are empty, so this zil was already clean. Okay, so if that was the case why was it on the list to clean? Another related question is why did we not return out of zil_clean() at the check against the validity of the itxg. Unfortunately we zero this out; however, it looks like we passed down the value of the ixg_itxs on to the taskq_dispatch. So at least that tells us that we did in fact have a valid entry at that point, the argument 'ffffff5551dea920' was the itxs.

> ffffff5551dea920::print itxs_t       
{
    i_sync_list = {
        list_size = 0x50
        list_offset = 0
        list_head = {
            list_next = 0xffffff4b3628b010
            list_prev = 0xffffff574cfecee0
        }
    }
    i_async_tree = {
        avl_root = 0xffffff50efe6ea68
        avl_compar = zil_aitx_compare
        avl_offset = 0x28
        avl_numnodes = 0x3
        avl_size = 0x40
    }
}

So here we have both synchronous and asynchronous transactions left. This makes me suspect that somehow even though we called zil_close() and are in the process of rolling back, other transactions for this objset were added. How exactly that happened, I'm not entirely sure yet, but it seems like there's a race in the rollback, the closing of the zil, and our ability to add additional transactions to it.

History

#1

Updated by George Wilson about 6 years ago

  • Assignee set to George Wilson
#2

Updated by George Wilson about 6 years ago

The thread doing the teardown would have called zil_close() executing the following logic:

mutex_enter(&zilog->zl_lock);
lwb = list_tail(&zilog->zl_lwb_list);
if (lwb != NULL)
txg = lwb->lwb_max_txg;
mutex_exit(&zilog->zl_lock);
if (txg)
txg_wait_synced(zilog->zl_dmu_pool, txg);

If this had worked as expected we should have waited for txg 0x7e0f1 to sync. Since the current syncing txg is 0x7e0f1 we know that this thread must have found the lwb_max_txg to be something less than 0x7e0f1.

Looking at the zil header block we see that the current header block has a birth time of 0x7da7d:

> ffffff4b7106c080::print zilog_t zl_header[]
zl_header = {
    zl_header->zh_claim_txg = 0
    zl_header->zh_replay_seq = 0
    zl_header->zh_log = {
        blk_dva = [
            {
                dva_word = [ 0x300000048, 0x71ff8 ]
            },
            {
                dva_word = [ 0, 0 ]
            },
            {
                dva_word = [ 0, 0 ]
            },
        ]
        blk_prop = 0x8009090200470047
        blk_pad = [ 0, 0 ]
        blk_phys_birth = 0
        blk_birth = 0x7da7d
        blk_fill = 0
        blk_cksum = {
            zc_word = [ 0xc46769a86a35d93d, 0x67da00163e4a7acd, 0xb8, 0x66 ]
        }
    }
    zl_header->zh_claim_blk_seq = 0
    zl_header->zh_flags = 0
    zl_header->zh_claim_lr_seq = 0
    zl_header->zh_pad = [ 0, 0, 0 ]
}

This would explain why the rollback thread continued to drive on and destroy the taskq and null out zl_clean_taskq. The next mystery is why is the header block so old given that we have itx records that are currently in the process of being cleaned up.

#3

Updated by Robert Mustacchi about 6 years ago

We have another dump that seems highly related. Here we are in the midst of doing a rollback and a thread in question is in an infinite loop in taskq_destroy. As a result we panicked the box. Specifically here's our thread in the rollback and taskq_destroy:

> ffffff0c29a0cb60::findstack -v
stack pointer for thread ffffff0c29a0cb60: ffffff00495fb370
...
  ffffff00495fba60 taskq_ent_free+0x6d(ffffff0c2e06c028, ffffff090a975598)
  ffffff00495fbab0 taskq_destroy+0x10d(ffffff0c2e06c028)
  ffffff00495fbaf0 zil_close+0x6e(ffffff0bb80dc080)
  ffffff00495fbb50 zfsvfs_teardown+0x5c(ffffff097acd4280, 0)
  ffffff00495fbb90 zfs_suspend_fs+0x1a(ffffff097acd4280)
  ffffff00495fbbd0 zfs_ioc_rollback+0x2d(ffffff0b9aab1000)
  ffffff00495fbc80 zfsdev_ioctl+0x4a7(5a00000000, 5a19, 8046a28, 100003, ffffff095af83be0, ffffff00495fbe68)
  ffffff00495fbcc0 cdev_ioctl+0x39(5a00000000, 5a19, 8046a28, 100003, ffffff095af83be0, ffffff00495fbe68)
  ffffff00495fbd10 spec_ioctl+0x60(ffffff090a686c40, 5a19, 8046a28, 100003, ffffff095af83be0, ffffff00495fbe68, 0)
  ffffff00495fbda0 fop_ioctl+0x55(ffffff090a686c40, 5a19, 8046a28, 100003, ffffff095af83be0, ffffff00495fbe68, 0)
  ffffff00495fbec0 ioctl+0x9b(3, 5a19, 8046a28)
  ffffff00495fbf10 _sys_sysenter_post_swapgs+0x149()

Due to the panic and an interrupt there are some additional stack frames which we've elided. So why is the taskq looping here. Basically our code for it looks like:

...
2077         tq->tq_minalloc = 0;
2078         while (tq->tq_nalloc != 0)
2079                 taskq_ent_free(tq, taskq_ent_alloc(tq, TQ_SLEEP));
...

Normally the taskq code expects to have no tasks around when we get here, but that's not the case here. Let's get a bit of information out of our taskq.

> ffffff0c2e06c028::taskq         
ADDR             NAME                             ACT/THDS Q'ED  MAXQ INST
ffffff0c2e06c028 zil_clean                          0/   0    1     1    -
> ffffff0c2e06c028::print taskq_t tq_task
tq_task = {
    tq_task.tqent_next = 0xffffff0c325e5b40
    tq_task.tqent_prev = 0xffffff0c325e5b40
    tq_task.tqent_func = 0
    tq_task.tqent_arg = 0
    tq_task.tqent_un = {
        tqent_bucket = 0
        tqent_flags = 0
    }
    tq_task.tqent_thread = 0
    tq_task.tqent_cv = {
        _opaque = 0
    }
}
> 0xffffff0c325e5b40::taskq_entry 
ENTRY            ARG              FUNCTION
ffffff0c325e5b40 ffffff0c0e624748 zil_itxg_clean

Well, that's weird. We have an extra entry in this taskq. Because of that the following bit of code will infinite loop. tq_nalloc will never reach zero as the taskq will never run the task it has. That leads us to ask the question, where does this task come from. Well, generally this task is added via

zil_clean()
which comes in via
dsl_ppol_sync_done
. Because we have the argument to zil_itxg_clean we can determine what transaction group that it was we got. It looks like we only have one asynch itxg_t sitting around. If we dump that structure we find this:

> ffffff0c0e624748::print -at itxs_t
ffffff0c0e624748 itxs_t {
    ffffff0c0e624748 list_t i_sync_list = {
        ffffff0c0e624748 size_t list_size = 0x50
        ffffff0c0e624750 size_t list_offset = 0
        ffffff0c0e624758 struct list_node list_head = {
            ffffff0c0e624758 struct list_node *list_next = 0xffffff0c0e624758
            ffffff0c0e624760 struct list_node *list_prev = 0xffffff0c0e624758
        }
    }
    ffffff0c0e624768 avl_tree_t i_async_tree = {
        ffffff0c0e624768 struct avl_node *avl_root = 0xffffff0acc8c4768
        ffffff0c0e624770 int (*)() avl_compar = zil_aitx_compare
        ffffff0c0e624778 size_t avl_offset = 0x28
        ffffff0c0e624780 ulong_t avl_numnodes = 0x1
        ffffff0c0e624788 size_t avl_size = 0x40
    }
}
> ffffff0c0e624768::walk avl | ::print itx_async_node_t ia_list | ::walk list | ::print itx_t
{
    itx_node = {
        list_next = 0xffffff0acc8c4758
        list_prev = 0xffffff0acc8c4758
    }
    itx_private = 0
    itx_wr_state = 0 (WR_INDIRECT)
    itx_sync = 0
    itx_sod = 0x70
    itx_oid = 0x2
    itx_lr = {
        lrc_txtype = 0xb
        lrc_reclen = 0x70
        lrc_txg = 0x1c27
        lrc_seq = 0
    }
}

So now we need to analyze how we get into this race situation a little bit more. First, it makes sense to look at the taskq portion and then work backwards to see if we can establish what might have happened in zfs for us to get here.

So adding something to the taskq via taskq_dispatch requires us to go through and grab the taskq's lock. Our thread in question is currently holding the lock as it's doing this adding and removing loop. It specifically grabbed the lock after the taskq_wait finished. taskq_wait does a cv_wait to make sure there are no entries. So we know that if this entry had been added before the taskq_wait call then it would have been run. Therefore, this must have come in after the call to taskq_wait, but before we run our next line which grabs the taskq_mutex. Because we're holding the mutex, that call would be blocked on us destroying the mutex and we would have ended up in this state. In other words, specifically here:

2053 
2054         /*
2055          * Wait for any pending entries to complete.
2056          */
2057         taskq_wait(tq);
2058        XXX RM BELIEVES THIS IS WHERE THE TASKQ BIT HAPPENED
2059         mutex_enter(&tq->tq_lock);
2060         ASSERT((tq->tq_task.tqent_next == &tq->tq_task) &&
2061             (tq->tq_active == 0));

If we were on debug bits, we would have blown the assert. Similarly if we hadn't had this race then we most likely would have executed this task and blow up later on in a similar way to the first time this happened. Given this, the big open question is where did the zfs race occur.

#4

Updated by Robert Mustacchi about 6 years ago

So continuing on we still have the more important question of where did zfs go awry. If we look again at our header block we'll find that we have a birth time that is again before what we're cleaning:

> ffffff0bb80dc080::print zilog_t zl_header[] blk_birth
zl_header = {
    zl_header->zh_claim_txg = 0
    zl_header->zh_replay_seq = 0
    zl_header->zh_log = {
        blk_dva = [
            {
                dva_word = [ 0x48, 0xbda0fbd ]
            },
            {
                dva_word = [ 0, 0 ]
            },
            {
                dva_word = [ 0, 0 ]
            },
        ]
        blk_prop = 0x8009090200470047
        blk_pad = [ 0, 0 ]
        blk_phys_birth = 0
        blk_birth = 0x1c18
        blk_fill = 0
        blk_cksum = {
            zc_word = [ 0x3b5e941bcd501777, 0xc1707377cdd5067b, 0x1174, 0x61 ]
        }
    }
    zl_header->zh_claim_blk_seq = 0
    zl_header->zh_flags = 0
    zl_header->zh_claim_lr_seq = 0
    zl_header->zh_pad = [ 0, 0, 0 ]
}

The birth block of the header is 0x1c18 and the block we're cleaning has a txg of 0x1c27. So this is definitely a bit confusing. Our zilog only has a single entry in its list of lwb's which correspond again to that same birth block of 0x1c18.

When we assign an itx we mark the zilog dirty and append it to the dataset. This leads us to during dsl_pool_sync_done() to go walk the dirty zilogs and clean them. This always happens as a part of zil_itx_assign() which comes from any of the entry points that still have data on them. The one in this dump had a type of 0xb which meant it was a setattr... things look a bit suspicious so let's check out other dumps of this that we've seen so far to corroborate...

###

From 87ecf8d878fed3954a2b9cde5eb11fbe:

> ffffff0c0e624768::walk avl | ::print itx_async_node_t ia_list | ::walk list | ::print -at itx_t
ffffff0b72cecdd8 itx_t {
    ffffff0b72cecdd8 list_node_t itx_node = {
        ffffff0b72cecdd8 struct list_node *list_next = 0xffffff0acc8c4758
        ffffff0b72cecde0 struct list_node *list_prev = 0xffffff0acc8c4758
    }
    ffffff0b72cecde8 void *itx_private = 0
    ffffff0b72cecdf0 itx_wr_state_t itx_wr_state = 0 (WR_INDIRECT)
    ffffff0b72cecdf4 uint8_t itx_sync = 0
    ffffff0b72cecdf8 uint64_t itx_sod = 0x70
    ffffff0b72cece00 uint64_t itx_oid = 0x2
    ffffff0b72cece08 lr_t itx_lr = {
        ffffff0b72cece08 uint64_t lrc_txtype = 0xb
        ffffff0b72cece10 uint64_t lrc_reclen = 0x70
        ffffff0b72cece18 uint64_t lrc_txg = 0x1c27
        ffffff0b72cece20 uint64_t lrc_seq = 0
    }
}
> ffffff0b72cece08::print lr_setattr_t
{
    lr_common = {
        lrc_txtype = 0xb
        lrc_reclen = 0x70
        lrc_txg = 0x1c27
        lrc_seq = 0
    }
    lr_foid = 0x2dd
    lr_mask = 0x300
    lr_mode = 0x75e2b000
    lr_uid = 0
    lr_gid = 0x1a211928
    lr_size = 0xffffff0c162e8030
    lr_atime = [ 0x51dee662, 0 ]
    lr_mtime = [ 0x51dee66a, 0 ]
}

From d025142bbc436f31270ac1b399cae572:

> ffffff56d110b088::walk avl | ::print itx_async_node_t ia_list | ::walk list | ::print itx_t
{
    itx_node = {
        list_next = 0xffffff61b045d958
        list_prev = 0xffffff61b045d958
    }
    itx_private = 0
    itx_wr_state = 0 (WR_INDIRECT)
    itx_sync = 0
    itx_sod = 0x70
    itx_oid = 0x2
    itx_lr = {
        lrc_txtype = 0xb
        lrc_reclen = 0x70
        lrc_txg = 0xb83f4
        lrc_seq = 0
    }
}
{
    itx_node = {
        list_next = 0xffffff48c9cee940
        list_prev = 0xffffff4d222e3498
    }
    itx_private = 0xffffff431b1c4900
    itx_wr_state = 2 (WR_NEED_COPY)
    itx_sync = 0
    itx_sod = 0x189
    itx_oid = 0xffff
    itx_lr = {
        lrc_txtype = 0x9
        lrc_reclen = 0xc0
        lrc_txg = 0xb83f4
        lrc_seq = 0
    }
}
{
    itx_node = {
        list_next = 0xffffff4d222e3498
        list_prev = 0xffffff4943353080
    }
    itx_private = 0xffffff431b1c4900
    itx_wr_state = 2 (WR_NEED_COPY)
    itx_sync = 0
    itx_sod = 0x264
    itx_oid = 0xffff
    itx_lr = {
        lrc_txtype = 0x9
        lrc_reclen = 0xc0
        lrc_txg = 0xb83f4
        lrc_seq = 0
    }
}

From 2ac5d84b4e9c980b5f99aad53775dc3d:

> ffffff5425a68c90::print itxs_t i_async_tree | ::walk avl | ::print itx_async_node_t ia_list | ::walk list | ::print itx_t
{
    itx_node = {
        list_next = 0xffffff505c0f1518
        list_prev = 0xffffff505c0f1518
    }
    itx_private = 0x2f62696c2f73746e
    itx_wr_state = 0t1701080942 (???)
    itx_sync = 0
    itx_sod = 0x70
    itx_oid = 0x6f7a2f092f6e696c
    itx_lr = {
        lrc_txtype = 0xb
        lrc_reclen = 0x70
        lrc_txg = 0xd1913
        lrc_seq = 0
    }
}
{
    itx_node = {
        list_next = 0xffffff460d4d7e80
        list_prev = 0xffffff438d5b0058
    }
    itx_private = 0xffffff4318954d80
    itx_wr_state = 2 (WR_NEED_COPY)
    itx_sync = 0
    itx_sod = 0x189
    itx_oid = 0xffff
    itx_lr = {
        lrc_txtype = 0x9
        lrc_reclen = 0xc0
        lrc_txg = 0xd1913
        lrc_seq = 0
    }
}
{
    itx_node = {
        list_next = 0xffffff438d5b0058
        list_prev = 0xffffff55dc3c3880
    }
    itx_private = 0xffffff4318954d80
    itx_wr_state = 2 (WR_NEED_COPY)
    itx_sync = 0
    itx_sod = 0x264
    itx_oid = 0xffff
    itx_lr = {
        lrc_txtype = 0x9
        lrc_reclen = 0xc0
        lrc_txg = 0xd1913
        lrc_seq = 0
    }
}

Hmm, that's a rather similar set of itx_t entries. Now I'm inclined to track down the foids and map them back to something else. Let's go back to our original dump. Well, the znode from the first dump is gone. Here's what we did to find it:

> ffffff0c29a0cb60::findstack -v
stack pointer for thread ffffff0c29a0cb60: ffffff00495fb370
  ffffff00495fb3e0 dbuf_find+0x110(ffffff091e2b5920, 0, 0)
  ffffff00495fb470 dbuf_hold_impl+0x15e(ffffff090a0ef700, ffffff0c032975c0, 0, ffffff00495fb7b8, 2, ffffff00495fb4e0)
  ffffff00495fb4b0 apic_intr_exit+0x3f(b, f0)
  ffffff00495fb510 hilevel_intr_epilog+0xc9(ffffff090a0f3a80, ffffff00495fb560, ffffff00495fb570, ffffff00495fb580)
  ffffff00495fb5b0 0xffffff090a0f0540()
  ffffff00495fb610 0xffffff090a0edb00()
  ffffff00495fb6b0 0xffffff090a0ec500()
  ffffff00495fb6c0 _sys_rtt_ints_disabled+8()
  ffffff00495fb7f0 fakesoftint_return()
  ffffff00495fb830 apic_intr_exit+0x3f(fffffffffb82954d, ffffff00495fb880)
  ffffff00495fb890 0xffffff090a0ec500()
  ffffff00495fb8f0 0xf0fb82a32f()
  ffffff00495fba20 mutex_enter+0x10()
  ffffff00495fba60 taskq_ent_free+0x6d(ffffff0c2e06c028, ffffff090a975598)
  ffffff00495fbab0 taskq_destroy+0x10d(ffffff0c2e06c028)
  ffffff00495fbaf0 zil_close+0x6e(ffffff0bb80dc080)
  ffffff00495fbb50 zfsvfs_teardown+0x5c(ffffff097acd4280, 0)
  ffffff00495fbb90 zfs_suspend_fs+0x1a(ffffff097acd4280)
  ffffff00495fbbd0 zfs_ioc_rollback+0x2d(ffffff0b9aab1000)
  ffffff00495fbc80 zfsdev_ioctl+0x4a7(5a00000000, 5a19, 8046a28, 100003, ffffff095af83be0, ffffff00495fbe68)
  ffffff00495fbcc0 cdev_ioctl+0x39(5a00000000, 5a19, 8046a28, 100003, ffffff095af83be0, ffffff00495fbe68)
  ffffff00495fbd10 spec_ioctl+0x60(ffffff090a686c40, 5a19, 8046a28, 100003, ffffff095af83be0, ffffff00495fbe68, 0)
  ffffff00495fbda0 fop_ioctl+0x55(ffffff090a686c40, 5a19, 8046a28, 100003, ffffff095af83be0, ffffff00495fbe68, 0)
  ffffff00495fbec0 ioctl+0x9b(3, 5a19, 8046a28)
  ffffff00495fbf10 _sys_sysenter_post_swapgs+0x149()
> ::walk zfs_znode_cache | ::printf "0x%x zfsvfs_t: %p, z_vnode: %p, z_id: %x\\n" znode_t . z_zfsvfs z_vnode z_id ! egrep 'z_id: 2dd$'
0xffffff0b751558f0 zfsvfs_t: ffffff0b72e36940, z_vnode: ffffff0b75154800, z_id: 2dd
0xffffff0b71795bc8 zfsvfs_t: ffffff0b6cf9c700, z_vnode: ffffff0b71794e80, z_id: 2dd
0xffffff0b6d1949d8 zfsvfs_t: ffffff0b6c9f9680, z_vnode: ffffff0b6d193b40, z_id: 2dd
0xffffff0b698a05f8 zfsvfs_t: ffffff0b692106c0, z_vnode: ffffff0b6989fa00, z_id: 2dd
0xffffff0b6968d4e0 zfsvfs_t: ffffff0b6b282b40, z_vnode: ffffff0b694df780, z_id: 2dd
0xffffff0b654de6e0 zfsvfs_t: ffffff0b5f0ccb80, z_vnode: ffffff0b654df540, z_id: 2dd
0xffffff0b325d37d0 zfsvfs_t: ffffff0b5bc254c0, z_vnode: ffffff0b325d5000, z_id: 2dd
0xffffff09fe4e2228 zfsvfs_t: ffffff0ba7521040, z_vnode: ffffff09fe4e1180, z_id: 2dd
0xffffff09fe3771f8 zfsvfs_t: ffffff090a52a980, z_vnode: ffffff09fe375480, z_id: 2dd
0xffffff0a2155c7f8 zfsvfs_t: ffffff0a10a20940, z_vnode: ffffff0a2155d580, z_id: 2dd
0xffffff0a0b4d5000 zfsvfs_t: ffffff0b6b26fd80, z_vnode: ffffff0a0a80c440, z_id: 2dd
0xffffff09b776eec0 zfsvfs_t: ffffff0b6acf1040, z_vnode: ffffff09b7770280, z_id: 2dd
0xffffff099cf552e8 zfsvfs_t: ffffff0bf2061500, z_vnode: ffffff099cf51a00, z_id: 2dd
0xffffff09a0e8d8d8 zfsvfs_t: ffffff0b6abefe00, z_vnode: ffffff09a0e8cd80, z_id: 2dd
0xffffff0993369220 zfsvfs_t: ffffff09791ad4c0, z_vnode: ffffff098c15fd40, z_id: 2dd
0xffffff098b7300f8 zfsvfs_t: ffffff0b6a83ce00, z_vnode: ffffff098b72f080, z_id: 2dd
0xffffff0979224db8 zfsvfs_t: ffffff0b6b035000, z_vnode: ffffff0979227780, z_id: 2dd
0xffffff098481a2e8 zfsvfs_t: ffffff09687a0dc0, z_vnode: ffffff0984823040, z_id: 2dd
0xffffff0983fd8210 zfsvfs_t: ffffff097acbe240, z_vnode: ffffff0983fd4780, z_id: 2dd
0xffffff0983e1e1f8 zfsvfs_t: ffffff0b6af394c0, z_vnode: ffffff0983e1d340, z_id: 2dd
0xffffff09754b3bb8 zfsvfs_t: ffffff0ba5595040, z_vnode: ffffff09754b5140, z_id: 2dd
0xffffff09504b68d8 zfsvfs_t: ffffff0b6acf14c0, z_vnode: ffffff0957901e80, z_id: 2dd
0xffffff095a1aedb0 zfsvfs_t: ffffff0ba11c8980, z_vnode: ffffff095a16f980, z_id: 2dd
0xffffff09559ccbb8 zfsvfs_t: ffffff0bb399c980, z_vnode: ffffff0954bc1140, z_id: 2dd
0xffffff09521cc108 zfsvfs_t: ffffff091d81edc0, z_vnode: ffffff09521bfc40, z_id: 2dd
> ::walk zfs_znode_cache | ::printf "0x%x zfsvfs_t: %p, z_vnode: %p, z_id: %x\\n" znode_t . z_zfsvfs z_vnode z_id ! egrep 'z_id: 2dd$' | grep 'zvfsvfs_t: ffffff097acd4280'

The fact that we can't find one of these is a bit surprising, though if we're tearing down this vfs_t via rollback, it's not unlikely that we've already closed all the vnodes related to this that we care about.

#5

Updated by Andriy Gapon almost 3 years ago

Possibly related issue observed on FreeBSD: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200592

#6

Updated by Electric Monk almost 3 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 43297f973a3543e7403ac27076490ab958a94b15

commit  43297f973a3543e7403ac27076490ab958a94b15
Author: George Wilson <george.wilson@delphix.com>
Date:   2016-11-06T17:24:58.000Z

    3821 Race in rollback, zil close, and zil flush
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
    Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
    Reviewed by: Andriy Gapon <avg@FreeBSD.org>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Also available in: Atom PDF