Panic destroying the pool using file backing store on FS with nbmand=on
There is specific scenario that works with nbmand=off and debug kernel panics with nbmand=on.
> ::status debugging crash dump vmcore.1 (64-bit) from atomic1 operating system: 5.11 NexentaStor_184.108.40.206:fcb4e039c5:debug (i86pc) image uuid: ae03b3b7-dfaa-6b91-8144-ac70c0d8a6fe panic message: assertion failed: vp->v_shrlocks == NULL, file: ../../common/fs/vnode.c, line: 2420 dump content: kernel pages only
::stack vpanic() 0xfffffffffbe0c7c8() vn_free+0x36(ffffff2d37654e00) zfs_znode_cache_destructor+0x4b(ffffff23f4befde0, 0) kmem_cache_free_debug+0x214(ffffff2312cd8648, ffffff23f4befde0, fffffffff78d6edb) kmem_cache_free+0x153(ffffff2312cd8648, ffffff23f4befde0) zfs_znode_free+0x8b(ffffff23f4befde0) zfs_zinactive+0xd6(ffffff23f4befde0) zfs_inactive+0x75(ffffff2d37654e00, ffffff235f2252b0, 0) fop_inactive+0x76(ffffff2d37654e00, ffffff235f2252b0, 0) vn_rele_dnlc+0xa2(ffffff2d37654e00) dnlc_purge_vfsp+0x1a2(ffffff2357995678, 0) dounmount+0x50(ffffff2357995678, 400, ffffff235f2252b0) umount2_engine+0x96(ffffff2357995678, 400, ffffff235f2252b0, 1) umount2+0x163(9829608, 400) _sys_sysenter_post_swapgs+0x237()
Steps to reproduce:
zpool create -O nbmand=on tpool1 c2t5000C50020E4989Bd0 mkfile 128M /tpool1/a mkfile 128M /tpool1/b zpool create tpool1_file mirror /tpool1/a /tpool1/b zpool export tpool1_file zpool destroy tpool1
Updated by Jan Kryl almost 5 years ago
- Tags deleted (
- % Done changed from 0 to 100
- Status changed from New to In Progress
When filesystem is mounted with nbmand=on option each open of a file implies creation of so called "share reservation" on the file. Share reservations enforce windows-like filesystem operation semantic, which use rather mandatory locking instead advisory locking by default. So each open of a file places a reservation on that file which prevents successful grabbing of conflicting mandatory locks. When the file is closed, the share reservation is removed. The way how system recognises which share reservation to remove when the file is closed is by looking up the reservation by PID and SYSID. SYSID is used if lock is acquired on behalf of a remote host. For local processes the SYSID is set to zero. When a vnode is destroyed in kernel (reference count drops to zero), it is expected to have empty list of reservations, because v_count of zero implies that the file is not open and because it's not open there should be no reservations.
When a file on filesystem with nbmand=on is used for a vdev a new file reservation is created on it. Since the underlaying file for vdev is open'd on behalf of ZFS taskq thread, the PID recorded in share reservation is zero. When destroying the pool, the file vdev is destroyed too which results in a call to fop_close() and vn_rele on underlaying vnode of the vdev. From fop_close we get to zfs_close which finally calls cleanshares to clean the reservation however in this case zfs_close is called on behalf of user process which does zpool destroy! So the PID is not a zero. Hence matching share reservation is not found on the vnode and orphaned reservation triggers assertion panic when vnode is destroyed later on.
Possible solution which I have successfully tested is to pass caller context structure to fop_close from vdev_file_close() with PID set to zero. This structure is then read by zfs_close and the PID of current process is overriden by the PID from caller context structure.