ZFS dedup=on should not panic the system when two blocks with same checksum exist in DDT
While testing interactions of dedup with corrupt on-disk data on my system during repairs, I found that oi_148a (LiveUSB) instantly reboots after the procedure listed below (tested twice during remote recovery).
Prerequisite: system with a bad data block in a file (doesn't match checksum stored in blockpointers and probably in DDT), original block which matches that checksum (I overwrite it into the corrupted file as part of the repairs). Original file (containing the corrupted block) was written onto a dataset with dedup enabled.
- "dedup=on" neither fixed the on-disk file, nor logged an error, and subsequent reads produced IO errors (and increased the counter). Probably just the DDT counter was increased during the write (that's the "works as designed" part according to Richard);
- "dedup=verify" doesn't log a checksum error if it finds a block whose assumed checksum matches the newly written block, but contents differ from the new block during dedup-verification and in fact these contents do not match the checksum either (at least, not the one in block pointer). Reading the block produced no errors (bug #2015);
- what's worse, reenabling "dedup=on" and writing the same again block crashes (reboots) the system instantly.
Possibly, because now there are two DDT entries pointing to same checksum in different blocks, and no verification was explicitly requested in this dataset?
If so, basically this means that if dedup=verify is used on at least one dataset (and has created two DDT entries with same checksums), it is unsafe to use plain dedup=on on any dataset in that pool. Particularly because the user has the file whose block matches the checksum, and it is likely that the file will be copied around and trigger the error. Thus - a BUG report ;)
Workaround: if you use dedup, always use it with verification. Helps against corrupt on-disk sectors, too ;)
Details in mail list, here: http://mail.opensolaris.org/pipermail/zfs-discuss/2012-January/050881.html
See also bugs #1981 and #2015
PS: It might be nice to craft a test with two different blocks sharing the same checksum and written with dedup=verify (so that two DDT entries exist), and then rewritten with dedup=on in same and/or different dataset -- would that work or crash? I'm not sure about how to create the hash collision samples... =)