ztest fails due to checksum errors
Sometimes ztest fails because zdb detects checksum errors. e.g.:
Traversing all blocks to verify checksums and verify nothing leaked ...
zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 8000160> DVA0=<0:1cc2000:180000> [L0 other uint64] sha256 uncompressed LE contiguou
s unique single size=100000L/100000P birth=271L/271P fill=1 cksum=c5a3e27d1ed0f894:843bca3a5473c4bf:f76a19b6830a2e4:91292591613a12bf --
zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 800000180> DVA0=<0:ce16800:180000> [L0 other uint64] sha256 uncompressed LE contigu
ous unique single size=100000L/100000P birth=840L/840P fill=1 cksum=5d018f3d061e17f3:6d1584784587bf63:2805a74a0ce37369:ba68a214806c7e75
zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 1000000360> DVA0=<0:10d37400:180000> [L0 other uint64] sha256 uncompressed LE conti
guous unique single size=100000L/100000P birth=904L/904P fill=1 cksum=fa1e11d4138bd14b:86c9488c444473e3:f31e43c72e72e46b:e3446472d1174d
ba -- skipping
zdb_blkptr_cb: Got error 50 reading <71, 47, 0, 400000002c0> DVA0=<0:127ef400:180000> [L0 other uint64] sha256 uncompressed LE cont
iguous dedup single size=100000L/100000P birth=549L/549P fill=1 cksum=30e14955ebf13522:66dc2ff8067e6810:4607e750abb9d3b3:6582b8af909fcb
58 -- skipping
zdb_blkptr_cb: Got error 50 reading <657, 5, 0, 1c0> DVA0=<0:1a180400:180000> [L0 other uint64] fletcher4 uncompressed LE contiguou
s unique single size=100000L/100000P birth=1091L/1091P fill=1 cksum=a6cf1e50:29b3bd01c57e5:36779b914035db9a:db61cdcf6bec56f0 -- skippin
The problem is that ztest_fault_inject() can inject multiple faults into the same block. It is designed such that it can inject errors on all leafs of a RAID-Z or mirror, but for a given range of offsets, it will only inject errors to a single leaf. The idea is that for any given block, it will have injected errors only on a single device, because this is the type of reconstruction we support. The injection-eligible ranges are separated by non-injectable ranges (aka DMZ) such that a single block is not supposed to straddle two injectable ranges.
The problem is that this logic assumes that the maximum blocksize is 128K, and therefore each DMZ need be only 265KB. This should have been increased when we introduced large blocks, as they can now span from one injectable range, across the 256KB DMZ and into the next injectable range on a different leaf. As the example above shows, all the blocks with checksum errors are >128KB.
Updated by Electric Monk almost 7 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
commit f9eb9fdf196b6ed476e4ffc69cecd8b0da3cb7e7 Author: Matthew Ahrens <email@example.com> Date: 2015-11-16T17:44:23.000Z 6451 ztest fails due to checksum errors Reviewed by: George Wilson <firstname.lastname@example.org> Reviewed by: Prakash Surya <email@example.com> Reviewed by: Jorgen Lundman <firstname.lastname@example.org> Approved by: Dan McDonald <email@example.com>