ZFS assertion failed in dsl_scan.c:1697 on snapshot destruction
I just switched one of my OI machines to DEBUG bits to do some kernel work there. A few minutes after booting time slider destroyed an old snapshot, which triggered this assertion failure. While I was typing this it happened again, so it's easily reproducable.
panic[cpu1]/thread=ffffd000402d2c40: assertion failed: dsl_dir_phys(dp->dp_free_dir)->dd_used_bytes == 0 (0x14000 == 0x0), file: ../../common/fs/zfs/dsl_scan.c, line: 1697 > $C ffffd000402d2970 vpanic() ffffd000402d29c0 0xfffffffffbe0c59d() ffffd000402d2a20 dsl_process_async_destroys+0x4da(ffffd00d5fa4a600, ffffd00df0047a00) ffffd000402d2a80 dsl_scan_sync+0x116(ffffd00d5fa4a600, ffffd00df0047a00) ffffd000402d2b50 spa_sync+0x533(ffffd00d5ac7e000, 236c191) ffffd000402d2c20 txg_sync_thread+0x297(ffffd00d5fa4a600) ffffd000402d2c30 thread_start+8() > ::ps ! grep zfs R 103615 101329 101329 101329 0 0x5a006000 ffffd00ebc451040 zfs > ffffd00ebc451040::ps -f S PID PPID PGID SID UID FLAGS ADDR NAME R 103615 101329 101329 101329 0 0x5a006000 ffffd00ebc451040 /usr/sbin/zfs destroy -d export/home/woodstoc@zfs-auto-snap_frequent-2019-02-24
Updated by Toomas Soome 8 months ago
Igor Kozhukhov wrote:
i see the same issue on dilos on DEBUG build with destroy older BE.
i can't reproduce this issue by ZFS tests(about 1200)
i thought it was related to my updates from ZoL, but right now, i think it can be related to feature 'large dnode' after integration
I did just try out the snapshot destroy via beadm destroy:
root@uefi-oi:/home/tsoome# beadm list
BE Active Mountpoint Space Policy Created
oi-20 - - 15,0M static 2018-10-06 10:36
oi-21 - - 14,2M static 2018-11-10 13:12
oi-22 - - 14,4M static 2018-11-18 16:48
oi-23 - - 14,7M static 2018-12-02 11:10
oi-24 - - 14,9M static 2018-12-13 09:01
oi-25 - - 13,8M static 2019-01-08 12:41
oi-26 - - 14,5M static 2019-01-09 17:51
oi-27 - - 14,2M static 2019-01-20 12:25
oi-28 - - 14,1M static 2019-02-17 19:04
oi-29 NR / 7,70G static 2019-02-24 18:44
And I did beadm destroy them all without a panic. That VM was updated to DEBUG mode build before this test.. Which of course only hints there must be more context.
Updated by Hans Rosenfeld 8 months ago
Igor reported an interesting panic that looks close:
panic[cpu3]/thread=fffffe000fd2ec20: assertion failed: dsl_dir_phys(dp->dp_free_dir)->dd_used_bytes == 0 (0xffffffffff9b1000 == 0x0), file: ../../common/fs/zfs/dsl_scan.c, line: 3219 fffffe000fd2e990 genunix:process_type+193699 () fffffe000fd2e9f0 zfs:dsl_process_async_destroys+49a () fffffe000fd2ea70 zfs:dsl_scan_sync+1cb () fffffe000fd2eb50 zfs:spa_sync+603 () fffffe000fd2ec00 zfs:txg_sync_thread+2a7 () fffffe000fd2ec10 unix:thread_start+8 ()
The value vor dd_used_bytes looks particularly interesting. I should add that Igor runs with a load of patches backported from everywhere that only he knows, so it's hard to say whether this is truly relevant to this bug.