Resilver restarts unnecessarily when it encounters errors
Port of two OpenZFS changes:
commit 41035a049643ff7083a6cb6cd43b8eb70a7d18a1 Author: John Poduska <firstname.lastname@example.org> Date: Wed May 13 13:54:27 2020 -0400 Resilver restarts unnecessarily when it encounters errors When a resilver finishes, vdev_dtl_reassess is called to hopefully excise DTL_MISSING (amongst other things). If there are errors during the resilver, they are tracked in DTL_SCRUB, as spelled out in the block comment in vdev.c. DTL_SCRUB is in-core only, so it can only be used if the pool was online for the whole resilver. This state is tracked with the spa_scrub_started flag, which only gets set when the scan is initialized. Unfortunately, this flag gets cleared right before vdev_dtl_reassess gets called, so if there are any errors during the scan, DTL_MISSING will never get excised and the resilver will just continually restart. This fix simply moves clearing that flag until after the call to vdev_dtl_reasses. In addition, if a pool is imported and already has scn_errors > 0, this change will restart the resilver immediately instead of doing the rest of the scan and then restarting it from the beginning. On the other hand, if scn_errors == 0 at import, then no errors have been encountered so far, so the spa_scrub_started flag can be safely set. A test has been added to verify that resilver does not restart when relevant DTL's are available. Reviewed-by: Brian Behlendorf <email@example.com> Reviewed-by: Paul Zuchowski <firstname.lastname@example.org> Signed-off-by: John Poduska <email@example.com> Closes #10291 <pre> And a smaller related change: <pre> commit 02638a30efb0e9800fdc8834d5b7ab66432db00a Author: Tim Chase <firstname.lastname@example.org> Date: Tue Mar 13 12:43:14 2018 -0500 Add zfs_scan_ignore_errors tunable When it's set, a DTL range will be cleared even if its scan/scrub had errors. This allows to work around resilver/scrub upon import when the pool has errors. Reviewed-by: Brian Behlendorf <email@example.com> Signed-off-by: Tim Chase <firstname.lastname@example.org> Closes #7293 </pre>
Updated by Jason King about 2 years ago
The original description goes into the details as to why, but to highlight the core behaviors:
Prior to this change, errors during a resilver can cause the resilver to never finish and continually restart (this fixes it).
Additionally, if a pool with errors is imported, it will immediately try a resilver instead finishing the scan.
Updated by Electric Monk about 2 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
commit 165c5c6fe7d6c7a95878c8a3aae7da65d1da1d90 Author: John Poduska <email@example.com> Date: 2020-05-22T19:16:38.000Z 12774 Resilver restarts unnecessarily when it encounters errors Portions contributed by: Tim Chase <firstname.lastname@example.org> Portions contributed by: Jason King <email@example.com> Reviewed by: Brian Behlendorf <firstname.lastname@example.org> Reviewed by: Paul Zuchowski <email@example.com> Reviewed by: Toomas Soome <firstname.lastname@example.org>