Bug #7199
dsl_dataset_rollback_sync may try to free already free blocks
100%
Description
dsl_dataset_rollback_sync
may try to free already freed blocks when it calls dsl_destroy_head_sync_impl
to destroy a temporary clone.
That happens if a snapshot to which we are rolling back and from which the clone is created has some ZIL records.
Related issues
Updated by Andriy Gapon over 4 years ago
- Related to Feature #7197: ztest should test rollback added
Updated by Andriy Gapon over 4 years ago
I based this bug report on the following panic that I observed on FreeBSD.
panic: freeing free block; rs=0xfffff800288dba40 cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper+0x27 kdb_backtrace+0x39 vpanic+0x11f range_tree_verify+0x5a metaslab_check_free+0xc5 zio_free+0x26 zio_free_zil+0x71 zil_free_log_block+0x25 zil_parse+0x159 zil_destroy_sync+0x55 dsl_destroy_head_sync_impl+0x50b dsl_dataset_rollback_sync+0x155 dsl_sync_task_sync+0x11b dsl_pool_sync+0x45f spa_sync+0x56e txg_sync_thread+0x303 fork_exit+0xad
Updated by Andriy Gapon about 4 years ago
It seems that this problem occurs only in a very low probability scenario where two (or more) rollback sync-tasks for the same dataset are executed in the same txg sync pass.
As an aside, it would be nice if the kernel-side code was aware of a desired target snapshot and refused to perform the rollback if the snapshot doesn't exist or is not the latest. At present, the check is done in the userland, but it is insufficient for potential concurrent rollback requests.
So, my theory is this. The first rollback sync-task replaces the dataset with a clone created from the previous snapshot. The dataset points to the snapshot's physical / on-disk objset as a result. A new objset_t is created in memory, its os_zil_header
is zeroed out, but os_phys->os_zil_header
is left intact (obviously). When the second rollback is done, the dataset created by the previous rollback gets destroyed after the clone-swap operation. The clone-swap operation evicts objsets of both datasets. So, when dsl_destroy_head_sync_impl
calls dmu_objset_from_ds
, the latter creates a new objset instance. As a result, its os_zil_header
and os_zil
are based off os_phys->os_zil_header
, which could be non-zero. Subsequently, zil_destroy_sync
acts on a stale ZIL chain that happened to be stored in the snapshot's on-disk objset.
In other words, because both rollbacks are performed back-to-back, dmu_objset_sync
is never called after os_zil_header
of the intermediate dataset is zeroed out in memory. Thus, the snapshot's objset is never "forked", so the dataset keeps pointing to it instead of a new copy. And the second rollback tries to walk the snapshot's log chain instead of seeing a an empty ZIL header.
Does this sound plausible?
Updated by Andriy Gapon about 4 years ago
- Related to Bug #7200: no blocks must be born in a txg after a snapshot is created added
Updated by Electric Monk about 4 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit bfaed0b91e57062c38bc16b4f89db3c8f0052a9b
commit bfaed0b91e57062c38bc16b4f89db3c8f0052a9b Author: Andriy Gapon <andriy.gapon@clusterhq.com> Date: 2016-11-22T00:10:08.000Z 7199 dsl_dataset_rollback_sync may try to free already free blocks 7200 no blocks must be born in a txg after a snaphot is created Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Gordon Ross <gordon.w.ross@gmail.com>