Panic on zil/slog replay when TX_REMOVE followed by TX_CREATE
panic[cpu7]/thread=fffffe16f0f4fb40: assertion failed: dmu_object_claim_dnsize(zfsvfs->z_os, obj, DMU_OT_PLAIN_FILE_CONTENTS, 0, obj_type, bonuslen, dnodesize, tx) == 0 (0x1c == 0x0), file: ../../common/fs/zfs/zfs_znode.c, line: 861 fffffe002258a280 genunix:process_type+153649 () fffffe002258a400 zfs:zfs_mknode+7e0 () fffffe002258a540 zfs:zfs_create+6fa () fffffe002258a5e0 genunix:fop_create+cf () fffffe002258a7a0 zfs:zfs_replay_create+2b8 () fffffe002258a800 zfs:zil_replay_log_record+f2 () fffffe002258a9d0 zfs:zil_parse+1f8 () fffffe002258aa50 zfs:zil_replay+bc () fffffe002258aa90 zfs:zfsvfs_setup+bd () fffffe002258ab10 zfs:zfs_domount+171 () fffffe002258ac30 zfs:zfs_mount+2a7 () fffffe002258ac60 genunix:fsop_mount+14 () fffffe002258add0 genunix:domount+952 () fffffe002258ae70 genunix:mount+fe () fffffe002258aeb0 genunix:syscall_ap+98 () fffffe002258af10 unix:brand_sys_sysenter+1dc ()
This problem was introduced with the large dnodes feature back in March - I've seen only two reports of this in the wild.
The following commit from zfsonlinux fixes this (and adds a test to ensure there are not future regressions)
commit 035e96118bc9a7cbf435dd17dda507b870fcf6e6 Author: Chunwei Chen <email@example.com> Date: Wed Aug 28 10:42:02 2019 -0700 Fix zil replay panic when TX_REMOVE followed by TX_CREATE If TX_REMOVE is followed by TX_CREATE on the same object id, we need to make sure the object removal is completely finished before creation. The current implementation relies on dnode_hold_impl with DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work fine before, in current version it does not guarantee the object removal is completed. We fix this by checking if DNODE_MUST_BE_FREE returns successful instead. Also add test and remove dead code in dnode_hold_impl.
Updated by Andy Fiddaman about 1 year ago
With the ZoL fix in place, the new test passes and does not cause a panic
Test: /opt/zfs-tests/tests/functional/slog/slog_replay_fs_001 (run as root) [00:11] [PASS] Test: /opt/zfs-tests/tests/functional/slog/slog_replay_fs_002 (run as root) [00:22] [PASS]
Updated by Electric Monk about 1 year ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
commit d8849d7dee03b84a3fa281ec65eb9e3d86d3756b Author: Chunwei Chen <firstname.lastname@example.org> Date: 2019-11-11T20:21:17.000Z 11943 Fix out-of-order ZIL txtype lost on hardlinked files 11942 Panic on zil/slog replay when TX_REMOVE followed by TX_CREATE Portions contributed by: Ryan Moeller <email@example.com> Portions contributed by: Andy Fiddaman <firstname.lastname@example.org> Reviewed by: Jerry Jelinek <email@example.com> Reviewed by: Toomas Soome <firstname.lastname@example.org> Approved by: Dan McDonald <email@example.com>