Bug #8614
openChecksum errors on a mirrored child of a raidz are incorrectly accounted
0%
Description
If a drive returns incorrect data, ZFS generates checksum errors. When a drive experiences too many checksum errors, a fault management program may mark it DEGRADED. However, when an active spare of a raidz child returns incorrect data, the checksum errors accrue to the spare-0 vdev, not the basic vdev. That makes it impossible for the administrator to know which disk is faulty: the spare or the original.
$ truncate -s 1g /tmp/f0 /tmp/f1 /tmp/f2
$ sudo zpool create foo raidz1 /tmp/f0 /tmp/f1 spare /tmp/f2
$ sudo zpool replace foo /tmp/f0 /tmp/f2
$ sudo /bin/dd if=/dev/zero of=/foo/testfile bs=1024k count=128
128+0 records in
128+0 records out
134217728 bytes transferred in 0.621678 secs (215895782 bytes/sec)
$ sudo dd if=/dev/urandom of=/tmp/f2 bs=1024k count=64 conv=notrunc
64+0 records in
64+0 records out
67108864 bytes transferred in 1.104980 secs (60733098 bytes/sec)
$ zpool status foo
pool: foo
state: ONLINE
scan: scrub repaired 13.3M in 0h0m with 0 errors on Tue Sep 5 15:41:31 2017
config:
NAME STATE READ WRITE CKSUM
foo ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
spare-0 ONLINE 0 0 70
/tmp/f0 ONLINE 0 0 0
/tmp/f2 ONLINE 0 0 0
/tmp/f1 ONLINE 0 0 0
spares
/tmp/f2 INUSE currently in use
errors: No known data errors
Notice how the 70 checksum errors are in the "spare-0" row. They should be in the "/tmp/f2" row.
No data to display