Project

General

Profile

Actions

Bug #12774

closed

Resilver restarts unnecessarily when it encounters errors

Added by Jason King almost 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
zfs - Zettabyte File System
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

Port of two OpenZFS changes:

commit 41035a049643ff7083a6cb6cd43b8eb70a7d18a1
Author: John Poduska <jpoduska@datto.com>
Date:   Wed May 13 13:54:27 2020 -0400

    Resilver restarts unnecessarily when it encounters errors

    When a resilver finishes, vdev_dtl_reassess is called to hopefully
    excise DTL_MISSING (amongst other things). If there are errors during
    the resilver, they are tracked in DTL_SCRUB, as spelled out in the
    block comment in vdev.c. DTL_SCRUB is in-core only, so it can only
    be used if the pool was online for the whole resilver. This state is
    tracked with the spa_scrub_started flag, which only gets set when
    the scan is initialized. Unfortunately, this flag gets cleared right
    before vdev_dtl_reassess gets called, so if there are any errors
    during the scan, DTL_MISSING will never get excised and the resilver
    will just continually restart. This fix simply moves clearing that
    flag until after the call to vdev_dtl_reasses.

    In addition, if a pool is imported and already has scn_errors > 0,
    this change will restart the resilver immediately instead of doing
    the rest of the scan and then restarting it from the beginning. On
    the other hand, if scn_errors == 0 at import, then no errors have
    been encountered so far, so the spa_scrub_started flag can be safely
    set.

    A test has been added to verify that resilver does not restart when
    relevant DTL's are available.

    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
    Signed-off-by: John Poduska <jpoduska@datto.com>
    Closes #10291
<pre>

And a smaller related change:

<pre>
commit 02638a30efb0e9800fdc8834d5b7ab66432db00a
Author: Tim Chase <tim@chase2k.com>
Date:   Tue Mar 13 12:43:14 2018 -0500

    Add zfs_scan_ignore_errors tunable

    When it's set, a DTL range will be cleared even if its scan/scrub had
    errors.  This allows to work around resilver/scrub upon import when the
    pool has errors.

    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Tim Chase <tim@chase2k.com>
    Closes #7293
</pre>
Actions #1

Updated by Jason King almost 3 years ago

To test, I ran the zfs test suite (which included a new test for this change). All of the expected tests pass, and all the failures were known issues with existing bugs.

Actions #2

Updated by Jason King almost 3 years ago

The original description goes into the details as to why, but to highlight the core behaviors:

Prior to this change, errors during a resilver can cause the resilver to never finish and continually restart (this fixes it).
Additionally, if a pool with errors is imported, it will immediately try a resilver instead finishing the scan.

Actions #3

Updated by Electric Monk almost 3 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 165c5c6fe7d6c7a95878c8a3aae7da65d1da1d90

commit  165c5c6fe7d6c7a95878c8a3aae7da65d1da1d90
Author: John Poduska <jpoduska@datto.com>
Date:   2020-05-22T19:16:38.000Z

    12774 Resilver restarts unnecessarily when it encounters errors
    Portions contributed by: Tim Chase <tim@chase2k.com>
    Portions contributed by: Jason King <jason.king@joyent.com>
    Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed by: Paul Zuchowski <pzuchowski@datto.com>
    Reviewed by: Toomas Soome <tsoome@me.com>

Actions

Also available in: Atom PDF