Feature #2015
openZFS dedup=verify should trigger a CKSUM error if the on-disk block is corrupted
0%
Description
NOTE: I am not sure if this is or is not implemented already. Perhaps the RFE can be closed quickly ;)
I did not get to check whether "dedup=verify" triggers a checksum mismatch alarm if the preexisting on-disk data does not in fact match the checksum (corrupted userdata block).
I think such an alarm should exist and do as much as a scrub, read or other means of error detection and recovery would - try to recover data and/or report an unrecoverable failure.
Also invalidate or require verification for existing DDT entry with the same checksum (see bug #1981).
Updated by Jim Klimov over 10 years ago
Tested that when "dedup=verify", and an on-disk block was corrupt (not matching the stored checksum), and the original block from another storage source (not corrupt, matches checksum) was written, the pool's CKSUM counter was not increased. The new block was successfully stored (at another DVA), thus the dedup-verify comparison took place, but the checksum mismatch was not "discovered"/bumped.
Detailed log of the test case and reproduction here: http://mail.opensolaris.org/pipermail/zfs-discuss/2012-January/050881.html
Updated by Jim Klimov about 10 years ago
So I wonder HOW is the block read by dedup-writing code (in order to compare its contents) to bypass the normal checksumming and alarming? And how, lacking the checksum match, did it try to reconstruct the (corrupt) block from raidz2 satisfactorily?
In particular, if a single sector is broken (and the block can be recovered by raidzN), would this result in a contents-mismatch for dedup-verify and cause a separate unique write instead of proper repair (or at least signalling) of the partially or fully broken block?