ZFS assertion failed in dsl_scan.c:1697 on snapshot destruction
I just switched one of my OI machines to DEBUG bits to do some kernel work there. A few minutes after booting time slider destroyed an old snapshot, which triggered this assertion failure. While I was typing this it happened again, so it's easily reproducable.
panic[cpu1]/thread=ffffd000402d2c40: assertion failed: dsl_dir_phys(dp->dp_free_dir)->dd_used_bytes == 0 (0x14000 == 0x0), file: ../../common/fs/zfs/dsl_scan.c, line: 1697 > $C ffffd000402d2970 vpanic() ffffd000402d29c0 0xfffffffffbe0c59d() ffffd000402d2a20 dsl_process_async_destroys+0x4da(ffffd00d5fa4a600, ffffd00df0047a00) ffffd000402d2a80 dsl_scan_sync+0x116(ffffd00d5fa4a600, ffffd00df0047a00) ffffd000402d2b50 spa_sync+0x533(ffffd00d5ac7e000, 236c191) ffffd000402d2c20 txg_sync_thread+0x297(ffffd00d5fa4a600) ffffd000402d2c30 thread_start+8() > ::ps ! grep zfs R 103615 101329 101329 101329 0 0x5a006000 ffffd00ebc451040 zfs > ffffd00ebc451040::ps -f S PID PPID PGID SID UID FLAGS ADDR NAME R 103615 101329 101329 101329 0 0x5a006000 ffffd00ebc451040 /usr/sbin/zfs destroy -d export/home/woodstoc@zfs-auto-snap_frequent-2019-02-24
Updated by Toomas Soome 11 months ago
Igor Kozhukhov wrote:
i see the same issue on dilos on DEBUG build with destroy older BE.
i can't reproduce this issue by ZFS tests(about 1200)
i thought it was related to my updates from ZoL, but right now, i think it can be related to feature 'large dnode' after integration
I did just try out the snapshot destroy via beadm destroy:
root@uefi-oi:/home/tsoome# beadm list
BE Active Mountpoint Space Policy Created
oi-20 - - 15,0M static 2018-10-06 10:36
oi-21 - - 14,2M static 2018-11-10 13:12
oi-22 - - 14,4M static 2018-11-18 16:48
oi-23 - - 14,7M static 2018-12-02 11:10
oi-24 - - 14,9M static 2018-12-13 09:01
oi-25 - - 13,8M static 2019-01-08 12:41
oi-26 - - 14,5M static 2019-01-09 17:51
oi-27 - - 14,2M static 2019-01-20 12:25
oi-28 - - 14,1M static 2019-02-17 19:04
oi-29 NR / 7,70G static 2019-02-24 18:44
And I did beadm destroy them all without a panic. That VM was updated to DEBUG mode build before this test.. Which of course only hints there must be more context.
Updated by Hans Rosenfeld 11 months ago
Igor reported an interesting panic that looks close:
panic[cpu3]/thread=fffffe000fd2ec20: assertion failed: dsl_dir_phys(dp->dp_free_dir)->dd_used_bytes == 0 (0xffffffffff9b1000 == 0x0), file: ../../common/fs/zfs/dsl_scan.c, line: 3219 fffffe000fd2e990 genunix:process_type+193699 () fffffe000fd2e9f0 zfs:dsl_process_async_destroys+49a () fffffe000fd2ea70 zfs:dsl_scan_sync+1cb () fffffe000fd2eb50 zfs:spa_sync+603 () fffffe000fd2ec00 zfs:txg_sync_thread+2a7 () fffffe000fd2ec10 unix:thread_start+8 ()
The value vor dd_used_bytes looks particularly interesting. I should add that Igor runs with a load of patches backported from everywhere that only he knows, so it's hard to say whether this is truly relevant to this bug.