Project

General

Profile

Bug #8614

Checksum errors on a mirrored child of a raidz are incorrectly accounted

Added by Alan Somers about 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
2017-09-05
Due date:
% Done:

0%

Estimated time:
Difficulty:
Hard
Tags:
needs-triage
Gerrit CR:

Description

If a drive returns incorrect data, ZFS generates checksum errors. When a drive experiences too many checksum errors, a fault management program may mark it DEGRADED. However, when an active spare of a raidz child returns incorrect data, the checksum errors accrue to the spare-0 vdev, not the basic vdev. That makes it impossible for the administrator to know which disk is faulty: the spare or the original.

$ truncate -s 1g /tmp/f0 /tmp/f1 /tmp/f2
$ sudo zpool create foo raidz1 /tmp/f0 /tmp/f1 spare /tmp/f2
$ sudo zpool replace foo /tmp/f0 /tmp/f2
$ sudo /bin/dd if=/dev/zero of=/foo/testfile bs=1024k count=128
128+0 records in
128+0 records out
134217728 bytes transferred in 0.621678 secs (215895782 bytes/sec)
$ sudo dd if=/dev/urandom of=/tmp/f2 bs=1024k count=64 conv=notrunc
64+0 records in
64+0 records out
67108864 bytes transferred in 1.104980 secs (60733098 bytes/sec)
$ zpool status foo
pool: foo
state: ONLINE
scan: scrub repaired 13.3M in 0h0m with 0 errors on Tue Sep 5 15:41:31 2017
config:

NAME           STATE     READ WRITE CKSUM
foo ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
spare-0 ONLINE 0 0 70
/tmp/f0 ONLINE 0 0 0
/tmp/f2 ONLINE 0 0 0
/tmp/f1 ONLINE 0 0 0
spares
/tmp/f2 INUSE currently in use

errors: No known data errors

Notice how the 70 checksum errors are in the "spare-0" row. They should be in the "/tmp/f2" row.

No data to display

Also available in: Atom PDF