Project

General

Profile

Feature #2015

ZFS dedup=verify should trigger a CKSUM error if the on-disk block is corrupted

Added by Jim Klimov over 8 years ago. Updated about 8 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
2012-01-21
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

NOTE: I am not sure if this is or is not implemented already. Perhaps the RFE can be closed quickly ;)

I did not get to check whether "dedup=verify" triggers a checksum mismatch alarm if the preexisting on-disk data does not in fact match the checksum (corrupted userdata block).
I think such an alarm should exist and do as much as a scrub, read or other means of error detection and recovery would - try to recover data and/or report an unrecoverable failure.
Also invalidate or require verification for existing DDT entry with the same checksum (see bug #1981).

History

#1

Updated by Jim Klimov over 8 years ago

Tested that when "dedup=verify", and an on-disk block was corrupt (not matching the stored checksum), and the original block from another storage source (not corrupt, matches checksum) was written, the pool's CKSUM counter was not increased. The new block was successfully stored (at another DVA), thus the dedup-verify comparison took place, but the checksum mismatch was not "discovered"/bumped.

Detailed log of the test case and reproduction here: http://mail.opensolaris.org/pipermail/zfs-discuss/2012-January/050881.html

#2

Updated by Jim Klimov about 8 years ago

So I wonder HOW is the block read by dedup-writing code (in order to compare its contents) to bypass the normal checksumming and alarming? And how, lacking the checksum match, did it try to reconstruct the (corrupt) block from raidz2 satisfactorily?

In particular, if a single sector is broken (and the block can be recovered by raidzN), would this result in a contents-mismatch for dedup-verify and cause a separate unique write instead of proper repair (or at least signalling) of the partially or fully broken block?

Also available in: Atom PDF