Actions
Bug #12774
closedResilver restarts unnecessarily when it encounters errors
Start date:
Due date:
% Done:
100%
Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:
Description
Port of two OpenZFS changes:
commit 41035a049643ff7083a6cb6cd43b8eb70a7d18a1 Author: John Poduska <jpoduska@datto.com> Date: Wed May 13 13:54:27 2020 -0400 Resilver restarts unnecessarily when it encounters errors When a resilver finishes, vdev_dtl_reassess is called to hopefully excise DTL_MISSING (amongst other things). If there are errors during the resilver, they are tracked in DTL_SCRUB, as spelled out in the block comment in vdev.c. DTL_SCRUB is in-core only, so it can only be used if the pool was online for the whole resilver. This state is tracked with the spa_scrub_started flag, which only gets set when the scan is initialized. Unfortunately, this flag gets cleared right before vdev_dtl_reassess gets called, so if there are any errors during the scan, DTL_MISSING will never get excised and the resilver will just continually restart. This fix simply moves clearing that flag until after the call to vdev_dtl_reasses. In addition, if a pool is imported and already has scn_errors > 0, this change will restart the resilver immediately instead of doing the rest of the scan and then restarting it from the beginning. On the other hand, if scn_errors == 0 at import, then no errors have been encountered so far, so the spa_scrub_started flag can be safely set. A test has been added to verify that resilver does not restart when relevant DTL's are available. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: John Poduska <jpoduska@datto.com> Closes #10291 <pre> And a smaller related change: <pre> commit 02638a30efb0e9800fdc8834d5b7ab66432db00a Author: Tim Chase <tim@chase2k.com> Date: Tue Mar 13 12:43:14 2018 -0500 Add zfs_scan_ignore_errors tunable When it's set, a DTL range will be cleared even if its scan/scrub had errors. This allows to work around resilver/scrub upon import when the pool has errors. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #7293 </pre>
Updated by Jason King almost 3 years ago
To test, I ran the zfs test suite (which included a new test for this change). All of the expected tests pass, and all the failures were known issues with existing bugs.
Updated by Jason King almost 3 years ago
The original description goes into the details as to why, but to highlight the core behaviors:
Prior to this change, errors during a resilver can cause the resilver to never finish and continually restart (this fixes it).
Additionally, if a pool with errors is imported, it will immediately try a resilver instead finishing the scan.
Updated by Electric Monk almost 3 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit 165c5c6fe7d6c7a95878c8a3aae7da65d1da1d90
commit 165c5c6fe7d6c7a95878c8a3aae7da65d1da1d90 Author: John Poduska <jpoduska@datto.com> Date: 2020-05-22T19:16:38.000Z 12774 Resilver restarts unnecessarily when it encounters errors Portions contributed by: Tim Chase <tim@chase2k.com> Portions contributed by: Jason King <jason.king@joyent.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Paul Zuchowski <pzuchowski@datto.com> Reviewed by: Toomas Soome <tsoome@me.com>
Actions