Project

General

Profile

Bug #6676

Race between unique_insert() and unique_remove() causes ZFS fsid change

Added by Marcel Telka over 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
zfs - Zettabyte File System
Start date:
2016-02-25
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

The fsid of zfs filesystems might change after reboot or remount. The problem seems to be caused by a race between unique_insert() and unique_remove(). The unique_remove() is called from dsl_dataset_evict() which is now an asynchronous thread. In a case the dsl_dataset_evict() thread is very slow and calls unique_remove() too late we will end up with changed fsid on zfs mount.

This problem is very likely caused by #5056.

Steps to Reproduce

Note: I'm able to reproduce this always on a single core (virtual) machine. On multicore machines it is not so easy to reproduce.

# uname -a
SunOS openindiana 5.11 illumos-633aa80 i86pc i386 i86pc Solaris
# zfs create rpool/TEST
# FS=$(echo ::fsinfo | mdb -k | grep TEST | awk '{print $1}')
# echo $FS::print vfs_t vfs_fsid | mdb -k
vfs_fsid = {
    vfs_fsid.val = [ 0x54d7028a, 0x70311508 ]
}
# zfs umount rpool/TEST
# zfs mount rpool/TEST
# FS=$(echo ::fsinfo | mdb -k | grep TEST | awk '{print $1}')
# echo $FS::print vfs_t vfs_fsid | mdb -k
vfs_fsid = {
    vfs_fsid.val = [ 0xd9454e49, 0x6b36d08 ]
}
#

Impact

The persistent fsid (filesystem id) is essential for proper NFS functionality. If the fsid of a filesystem changes on remount (or after reboot) the NFS clients might not be able to automatically recover from such event and the manual remount of the NFS filesystems on every NFS client might be needed.


Related issues

Related to illumos gate - Bug #5056: ZFS deadlock on db_mtx and dn_holdsClosed2015-01-13

Actions

History

#1

Updated by Marcel Telka over 3 years ago

  • Related to Bug #5056: ZFS deadlock on db_mtx and dn_holds added
#2

Updated by Josef Sipek over 3 years ago

  • Assignee set to Josef Sipek
  • % Done changed from 0 to 80
#3

Updated by Vitaliy Gusev almost 3 years ago

FSID change causes getting NFS3ERR_STALE / NFS4ERR_STALE error by NFSv3, NFSv4 clients:

# ls nfsmnt
ls: cannot access nfsmnt: Stale file handle

#5

Updated by Electric Monk almost 3 years ago

  • Status changed from New to Closed
  • % Done changed from 80 to 100

git commit 40510e8eba18690b9a9843b26393725eeb0f1dac

commit  40510e8eba18690b9a9843b26393725eeb0f1dac
Author: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Date:   2016-11-12T01:44:35.000Z

    6676 Race between unique_insert() and unique_remove() causes ZFS fsid change
    Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
    Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
    Reviewed by: Dan Vatca <dan.vatca@gmail.com>
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: George Wilson <george.wilson@delphix.com>
    Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
    Approved by: Robert Mustacchi <rm@joyent.com>

Also available in: Atom PDF