dsl_dataset_rollback_sync may try to free already free blocks
dsl_dataset_rollback_sync may try to free already freed blocks when it calls
dsl_destroy_head_sync_impl to destroy a temporary clone.
That happens if a snapshot to which we are rolling back and from which the clone is created has some ZIL records.
Updated by Andriy Gapon over 3 years ago
I based this bug report on the following panic that I observed on FreeBSD.
panic: freeing free block; rs=0xfffff800288dba40 cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper+0x27 kdb_backtrace+0x39 vpanic+0x11f range_tree_verify+0x5a metaslab_check_free+0xc5 zio_free+0x26 zio_free_zil+0x71 zil_free_log_block+0x25 zil_parse+0x159 zil_destroy_sync+0x55 dsl_destroy_head_sync_impl+0x50b dsl_dataset_rollback_sync+0x155 dsl_sync_task_sync+0x11b dsl_pool_sync+0x45f spa_sync+0x56e txg_sync_thread+0x303 fork_exit+0xad
Updated by Andriy Gapon about 3 years ago
It seems that this problem occurs only in a very low probability scenario where two (or more) rollback sync-tasks for the same dataset are executed in the same txg sync pass.
As an aside, it would be nice if the kernel-side code was aware of a desired target snapshot and refused to perform the rollback if the snapshot doesn't exist or is not the latest. At present, the check is done in the userland, but it is insufficient for potential concurrent rollback requests.
So, my theory is this. The first rollback sync-task replaces the dataset with a clone created from the previous snapshot. The dataset points to the snapshot's physical / on-disk objset as a result. A new objset_t is created in memory, its
os_zil_header is zeroed out, but
os_phys->os_zil_header is left intact (obviously). When the second rollback is done, the dataset created by the previous rollback gets destroyed after the clone-swap operation. The clone-swap operation evicts objsets of both datasets. So, when
dmu_objset_from_ds, the latter creates a new objset instance. As a result, its
os_zil are based off
os_phys->os_zil_header, which could be non-zero. Subsequently,
zil_destroy_sync acts on a stale ZIL chain that happened to be stored in the snapshot's on-disk objset.
In other words, because both rollbacks are performed back-to-back,
dmu_objset_sync is never called after
os_zil_header of the intermediate dataset is zeroed out in memory. Thus, the snapshot's objset is never "forked", so the dataset keeps pointing to it instead of a new copy. And the second rollback tries to walk the snapshot's log chain instead of seeing a an empty ZIL header.
Does this sound plausible?
Updated by Electric Monk about 3 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
commit bfaed0b91e57062c38bc16b4f89db3c8f0052a9b Author: Andriy Gapon <email@example.com> Date: 2016-11-22T00:10:08.000Z 7199 dsl_dataset_rollback_sync may try to free already free blocks 7200 no blocks must be born in a txg after a snaphot is created Reviewed by: Matthew Ahrens <firstname.lastname@example.org> Reviewed by: Brad Lewis <email@example.com> Approved by: Gordon Ross <firstname.lastname@example.org>