Feature #2021

ZDB capability to overwrite blocks by DVA (including compression and raidzN parities as needed)

Added by Jim Klimov over 8 years ago.

zfs - Zettabyte File System
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


I had some unrecoverable corruptions in a raidz2 pool, and my research of on-disk data with ZDB led me to blocks which were indeed invalid. I had another copy of the data on a different system, and could replace such blocks by "dd"ing them into files on live dataset (or rsyncing those files over).

This method failed, however, for a corruption referenced in a snapshot of a dataset that is incrementally sent from another zfs system: rewriting parts of the "live" version of the file on the receiving system would have no effect, because subsequent zfs receive would require to rollback the recipient to the last-common snapshot. Alternatively, the block could be rewritten on the sender and included in a subsequent zfs-send replication, but the recipient's view of the file in older snapshots would remain invalid in both cases.

Finally, the recipient's tree of snapshots could be discarded and re-sent again, resulting in lots of traffic to fix one broken block of data. This seems to be the only "proper" way now.

Instead, I would like an ability to replace the on-disk block with a known-good copy, preferably by overwriting the corrupt data on leaf vdevs (in the same sector numbers that the corrupt data existed) - since this data from the backups would match the existing checksums in blockpointer tree. Alternatively, to maintain the COW, this could be solved by creating a new block with required contents and referencing that from the block-pointer tree - iteratively overwriting the whole branch (DVAs would change, so parent BP checksums would change) - but using the same old TXG as seen in the snapshot. Possibly, this is what the system does when repairing data with discovered recoverable CKSUM errors during scrub/read.

This feature proposes that ZFS-aware tools, compatible with the current definitions of ZFS (such as the ZDB) be augmented with the power to PROPERLY overwrite on-disk blocks and/or emulate recovery operations with the data blocks provided by an admin, including the necessary transformations (compression, raidzN parities; maybe dedup info; possibly even encryption?) to minimize the human error or lack of instruments possible in this situation.
From some point of view, this might seem like the reverse of existing "zdb -R" command.

As an admin with root rights and access to the low-level hardware, I already have all possibilities to shoot myself in the foot and destroy ZFS by overwriting sectors on physical media with "dd" and other mechanisms. (The disk's ability to make a mistake and write garbage into a wrong location, leading to this corruption, effectively excercised the same "rights" to write anything anywhere.)
So this proposed solution would not worsen the situation, but it may help fixing some problems by writing proper bytes into proper locations.

Also available in: Atom PDF