10566 Multiple DVA Scrubbing Fix
Review Request #1572 - Created March 19, 2019 and submitted
| Information | |
|---|---|
| Toomas Soome | |
| illumos-gate | |
| 10566 | |
| 2e3b569... | |
| Reviewers | |
| general | |
Currently, there is an issue in the sequential scrub code which
prevents self healing from working in some cases. The scrub code
will split up all DVA copies of a bp and issue each of them
separately. The problem is that, since each of the DVAs is no
longer associated with the others, the self healing code doesn't
have the opportunity to repair problems that show up in one of the
DVAs with the data from the others.This patch fixes this issue by ensuring that all IOs issued by the
sequential scrub code include all DVAs. Initially, only the first
DVA of each is attempted. If an issue arises, the IO is retried
with all available copies, giving the self healing code a chance
to correct the issue.To test this change, this patch also adds the ability for zinject
to specify individual DVAs to inject read errors into. We then
add a new test case that utilizes this functionality to ensure
scrubs and self-healing reads can handle and transparently fix
issues with individual copies of blocks.Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8453Pre-and post fixes: 551905dd4 vdev_mirror: kstat observables for preferred vdev d6c6590c5 vdev_mirror: load balancing fixes 9f500936c FreeBSD r256956: Improve ZFS N-way mirror read performance by using load and locality information. fb40095f5 Disable LBA weighting on files and SSDs 4770aa064 Fix vdev_open_child() race on updating vdev_parent->vdev_nonrot 13d9a004f Fix taskq creation failure in vdev_open_children() 7dcd31883 Cleanup nits from ab7615d92 I assume we will land after https://www.illumos.org/issues/10809, so the avl.h macros presented here will be implemented by 10809
Depends On: |
|
||||
|---|---|---|---|---|---|
Commit: |
|
||||
Diff: |
Revision 2 (+891 -182) |
Change Summary:
rebase on: 10573 define TASKQID_INVALID as (taskq_id)0
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 3 (+891 -182) |
Change Summary:
Fix alignment issue. See also https://www.illumos.org/rb/r/1586/
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 4 (+891 -183) |
I'm confused by this change. Comparing to the relevant ZoL patch (commit ab7615d92c9b), the change here includes a lot more files and changes than are in the ZoL patch, and I am not sure why. Can you clarify this?
In addition, there was a follow-on commit for this work in the following ZoL commit which I think we should include here:
commit 7dcd31883299b3ee3801dd701f7a986a0af9b2f9
Author: Tom Caputi tcaputi@datto.com
Date: Sun Apr 14 14:03:06 2019 -0400Cleanup nits from ab7615d92 This patch simply up cleans up a nit and corrects an error message issue that were introduced in the Multiple DVA scrub patch.
Description: |
|
|---|
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 5 (+898 -187) |
Thanks for adding the list of additional patches included here. Generally this looks good to me, but the following patch
13d9a004f Fix taskq creation failure in vdev_open_children()
includes a new test (zpool_create_024_pos.ksh). Is there a reason this new test was not part of this work?
Change Summary:
add missing test.
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 6 (+1056 -187) |
Thanks for adding the test. This all looks good to me now.
Change Summary:
add wait_scrubbed function for test kit.
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 7 (+1067 -187) |
Ship It!
Change Summary:
remove zpool_scrub_offline_device from runfiles as we do not really support it yet.
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 8 (+1046 -187) |
Ship It!
Change Summary:
Fixes for 2 panic cases from Jerry Jelinek.
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 9 (+1059 -187) |
Ship It!
Change Summary:
add wait_scrubbed. Needed for zpool_scrub_multiple_copies
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 10 (+1077 -187) |
Change Summary:
we get reported number without suffix.
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 11 (+1077 -187) |
Ship It!
