Bug #3875

panic in zfs_root() after failed rollback

Added by Keith Wesolowski over 7 years ago. Updated over 7 years ago.

zfs - Zettabyte File System
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


A system panicked with the following stack:

ffffff01ecedf880 dnode_hold_impl+0x90(0, 4, 1, fffffffff7d633c1, ffffff01ecedf8c8)
ffffff01ecedf8b0 dnode_hold+0x28(0, 4, fffffffff7d633c1, ffffff01ecedf8c8)
ffffff01ecedf910 dmu_bonus_hold+0x37(0, 4, 0, ffffff01ecedf988)
ffffff01ecedf940 sa_buf_hold+0x1d(0, 4, 0, ffffff01ecedf988)
ffffff01ecedf9e0 zfs_zget+0x64(ffffff431a72a500, 4, ffffff01ecedfa08)
ffffff01ecedfa40 zfs_root+0x78(ffffff431a87db78, ffffff01ecedfa98)
ffffff01ecedfa80 fsop_root+0x2d(ffffff431a87db78, ffffff01ecedfa98)
ffffff01ecedfad0 traverse+0x87(ffffff01ecedfcc8)
ffffff01ecedfd30 lookuppnvp+0x229(ffffff01ecedfe50, ffffff01ecedfe30, 1, 0, 0, ffffff42e2b6d840, ffffff42e2b6d840, ffffff437f9083b0)
ffffff01ecedfdd0 lookuppnatcred+0x15e(ffffff01ecedfe50, ffffff01ecedfe30, 1, 0, 0, 0, ffffff437f9083b0)
ffffff01ecedfe20 lookuppn+0x38(ffffff01ecedfe50, ffffff01ecedfe30, 1, 0, 0)
ffffff01ecedfec0 resolvepath+0x86(8045580, 8045980, 400)
ffffff01ecedff10 _sys_sysenter_post_swapgs+0x149()

The proximate cause is that z_os is NULL. We are looking at:

> ffffff431a72a500::print zfsvfs_t z_vfs | ::print vfs_t vfs_mntpt | ::print ref
str_t rs_string | /s
0xffffff4317f2f290:             /zones/d4658531-a2ae-4dce-9382-54b9a0d5fa82

> ffffff01ecedfe50::print struct pathname
    pn_buf = 0xffffff4373a91b80 "/zones/d4658531-a2ae-4dce-9382-54b9a0d5fa82" 
    pn_path = 0xffffff4373a91bab "" 
    pn_pathlen = 0
    pn_bufsize = 0x400

The broader context here is that dozens of zones are being started up, for which the relevant ZFS activity consists of two general actions: rolling back filesystem datasets to a known snapshot, and checking the zone's root against all other zones by calling resolvepath() on them as seen in zoneadm.c's crosscheck_zonepaths(). These threads are looking at:

> ::stacks -a -c zfs_root | ::findstack -v  ! grep zfs_root | awk '{ print $2 }' | sed -e 's/.*(//' -e 's/,.*//' | sort | uniq
> ffffff431a87d838::print zfsvfs_t z_vfs->vfs_vnodecovered | ::vnode2path
> ffffff431a87db78
> ffffff431ac435d0

So we have a number of threads trying to resolvepath() this same path that we panicked in, but blocked. The thread we actually raced with is:

> 0t10431::pid2proc | ::walk thread | ::findstack -v
stack pointer for thread ffffff4360ce1b80: ffffff01ec6f1940
  ffffff01ec6f1990 bitset_find+0x6a(fffffffffbc7e018)
  ffffff01ec6f19d0 cpu_wakeup_mwait+0x70(ffffff42ed090080, 0)
  ffffff01ec6f1a40 setbackdq+0x200(ffffff42ed09eb00)
  ffffff01ec6f1a80 rwst_enter_common+0x2c5(fffffffff78dd92e, ffffffff00000000, 1fbcf4d98)
  ffffff01ec6f1ac0 0xffffff01ec6f1b50()
  ffffff01ec6f1c90 socket_vop_getattr+0x56(ffffff4358c84a80, ffffff01ec6f1db0, 0, ffffff43128860e0, 0)
  ffffff01ec6f1d10 fop_getattr+0xa8(ffffff4358c84a80, ffffff01ec6f1db0, 0, ffffff43128860e0, 0)
  ffffff01ec6f1e80 cstat64_32+0x4a(ffffff4358c84a80, 8045a00, 0, ffffff43128860e0)
  ffffff01ec6f1ec0 fstat64_32+0x43(2, 8045a00)
  ffffff01ec6f1f10 _sys_sysenter_post_swapgs+0x149()

This thread is part of the process doing a rollback of our toxic dataset, and in userland is at:

> 0t10431::pid2proc | ::walk thread | ::print kthread_t t_lwp->lwp_regs | ::print struct regs r_rip
r_rip = 0xfeef7ae3

From looking at a similar process (since our dump does not have this process's pages in it), we are at:

> $C
08046b98`fstat64+0xc(2, 8046bb0, fefc9d86, fef80183, fefe6e90, 4)
08046c58`isseekable+0x4d(80770a0, fefd2a7a, fef90300, 11, 80470c0, fef90348)
08046c88`_setbufend+0x37(80770a0, fef6db58, feffb0a4, fef80086, 8047214, fef901b5)
08046d58`_findbuf+0x138(80770a0, fefd04f4, feffc480, fef80086, feffb0a4, fefd04cf)
08047ad8`_ndoprnt+0x55(8064d57, 8047b28, 80770a0, 0, 8047b10, fefc37a0)
08047b18`fprintf+0xb0(80770a0, 8064d57, 8064e46, 8047dd4, 8047b50, feffb0a4)
08047c78 zfs_do_rollback+0xe2(1, 8047d04, 8076860, 801, 0, 3)
08047cc8 main+0x212(8060dca, feffb0a4, 8047cf4, 805558b, 2, 8047d00)
08047cf4 _start+0x83(2, 8047dd0, 8047dd4, 0, 8047ddd, 8047df1)
> 0xfeef7ae3::dis`syscall:       popl   %edx`syscall+1:     popl   %eax`syscall+2:     pushl  %edx`syscall+3:     call   +0x0     <`syscall+8>`syscall+8:     popl   %edx`syscall+9:     movl   %esp,%ecx`syscall+0xb:   addl   $0xb,%edx`syscall+0x11:  sysenter`syscall+0x13:  movl   (%esp),%edx`syscall+0x16:  pushl  %edx`syscall+0x17:  jb     -0x9bb0d <`__cerror>`syscall+0x1d:  ret    
0xfeef7aee:                     nop`__systemcall:  popl   %edx`__systemcall+1:popl   %ecx`__systemcall+2:popl   %eax`__systemcall+3:pushl  %edx`__systemcall+4:int    $0x91`__systemcall+6:jae    +0xf     <`__systemcall+0x17>

So we are in the midst of trying to fprintf() to stderr after returning an error from the ZFS rollback ioctl. We therefore know that the ioctl failed. Looking at zfs_resume_fs(), the only way for z_os to be NULL is for us to fail at the top of this function. If we do so, we will:

- clear z_os
- drop our locks to allow others in ZFS_ENTER() to proceed
- return an error that is propagated to userland

This is entirely consistent with everything that we see, so we presume we ended up in this error path. The exact means by which we came to be in it is not entirely clear but we presumably ended up getting EBUSY somehow.

The problem here is that if we ever end up in this error path, we drop the locks protecting access to the zfsvfs_t prior to forcibly unmounting the filesystem. Because z_os is NULL, any thread that had already picked up the zfsvfs_t and was sitting in ZFS_ENTER() when we dropped our locks in zfs_resume_fs() will now acquire the lock, attempt to use z_os, and panic.

There are two plausible solutions here: one is to forcibly unmount the filesystem while still holding the locks. It is difficult to know whether this is safe to do. The other is to simply set z_umounted prior to dropping the locks. The zfs_umount() path does not examine this flag and will happily allow unmounting to proceed asynchronously, while everyone in ZFS_ENTER() will see z_umounted and fail. It may be that there's an even better solution that would prevent the seemingly spurious vfsop failures that this would entail, but it's not clear what that is. In any case, the rollback itself is going to fail.

Related issues

Related to illumos gate - Bug #9324: zfs-recv can be broken by some ZFS-ioctls New2018-03-21


Updated by Keith Wesolowski over 7 years ago

One possible way to reproduce this:

Part 1 (race exploder):

dtrace -wn 'fbt::zfs_resume_fs:entry { self->interest = 1; }' -n 'fbt::zfs_suspend_fs:entry { self->interest = 1; }' -n 'fbt::dsl_dataset_disown:entry/self->interest/ { print(args[0]); print(args[0]->ds_owner); self->ds = args[0]; }' -n 'fbt::dsl_dataset_disown:return/self->interest/ { print(self->ds); print(self->ds->ds_owner); }' -n 'fbt::zfs_suspend_fs:return { self->interest = 0; }' -n 'fbt::zfs_domount:entry { self->interest = 1; }' -n 'fbt::dsl_dataset_tryown:entry/self->interest/ { stack(); print(args[0]); print(args[0]->ds_owner); chill(200000000); print(args[0]->ds_owner); }' -n 'fbt::zfs_domount:return { self->interest = 0; }'

Part 2 (rollbacks):

while :; do
    zfs rollback zones/foo@init;
done &

Part 3 (race partner):

while :; do 
    if mount -F zfs zones/foo /mnt 2>/dev/null; then
        echo got it; break; 
done &

Part 4 (victim):

while :; do
    /var/tmp/resolvepath /zones/foo & /var/tmp/resolvepath /zones/foo & /var/tmp/resolvepath /zones/foo

Where the victim is the trivial program:

#include <unistd.h>
#include <limits.h>

main(int argc, char *argv[])
        char buf[PATH_MAX];

        return (resolvepath(argv[1], buf, PATH_MAX));

Updated by Keith Wesolowski over 7 years ago

Unfortunately, the unmount path does in fact use ZFS_ENTER(), specifically in:

  8  63579        zfsctl_root_lookup:return 

So the approach of simply setting z_unmounted and then dropping the locks in zfs_resume_fs()'s error path is unworkable.


Updated by Keith Wesolowski over 7 years ago

Attempting to force-unmount the stale filesystem doesn't work, so avoiding the dmu_objset_own() EBUSY is of little value:

 18  35523               vn_vfswlock:return 

We're getting EBUSY from vn_vfswlock() and never even calling dounmount().


Updated by Robert Mustacchi over 7 years ago

  • Status changed from New to Resolved
  • Assignee set to Keith Wesolowski
  • % Done changed from 0 to 100

Resolved in 91948b51b8e978ddc88a36b2bc3ae83c20cdc9aa.


Updated by Roman Strashkin almost 3 years ago

  • Related to Bug #9324: zfs-recv can be broken by some ZFS-ioctls added

Also available in: Atom PDF