Project

General

Profile

Bug #11768

disabled resilver_defer feature leads to looping resilvers

Added by Kody Kantor 5 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
zfs - Zettabyte File System
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Bite-size
Tags:

Description

When a disk is replaced with another on a pool with the resilver_defer
feature present, but not enabled the resilver activity restarts during
each spa_sync. This patch checks to make sure that the resilver_defer
feature is first enabled before requesting a deferred resilver.

This was originally fixed in illumos-joyent as OS-7982 (https://smartos.org/bugview/OS-7982).
This has been added to ZoL already as well. We should pull the fix into illumos-gate.


Related issues

Related to illumos gate - Bug #10952: defer new resilvers and misc. resilver-related fixesClosed2019-05-14

Actions

History

#1

Updated by Joshua M. Clulow 5 months ago

  • Related to Bug #10952: defer new resilvers and misc. resilver-related fixes added
#2

Updated by Joshua M. Clulow 5 months ago

So this regression was introduced in #10952?

#3

Updated by Joshua M. Clulow 5 months ago

I just hit this on a machine where I turned the rpool into a mirror. It looks like you have reviewers -- are we ready for integration?

#4

Updated by Kody Kantor 4 months ago

Hi Josh. Yes, this should be ready to integrate and was introduced by #10952. Thanks for linking that ticket.

I'm having problems booting new environments on my OI machine to test this. I can certainly spend some more time on that today to get this into the RTI queue.

#5

Updated by Kody Kantor 4 months ago

Testing notes:
- I manually verified this fix on SmartOS a month or so ago (details in https://smartos.org/bugview/OS-7982).

- zfs test report:
kkantor@oi:/var/tmp/test_results/20191017T211527$ grep '\[FAIL\]' log
Test: /opt/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_mount_encrypted (run as root) [00:02] [FAIL]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_missing_001_pos (run as root) [01:04] [FAIL]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_missing_002_pos (run as root) [00:00] [FAIL]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_missing_003_pos (run as root) [00:00] [FAIL]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_rename_001_pos (run as root) [00:00] [FAIL]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/import_cachefile_device_added (run as root) [00:00] [FAIL]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/import_cachefile_device_removed (run as root) [00:00] [FAIL]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/import_cachefile_mirror_attached (run as root) [00:00] [FAIL]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/import_cachefile_mirror_detached (run as root) [00:00] [FAIL]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/import_cachefile_device_replaced (run as root) [00:03] [FAIL]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/import_paths_changed (run as root) [00:00] [FAIL]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_encrypted (run as root) [00:00] [FAIL]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_encrypted_load (run as root) [00:00] [FAIL]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_labelclear/zpool_labelclear_exported (run as root) [00:09] [FAIL]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_resilver/zpool_resilver_restart (run as root) [04:28] [FAIL]
Test: /opt/zfs-tests/tests/functional/mmp/mmp_on_zdb (run as root) [00:39] [FAIL]
Test: /opt/zfs-tests/tests/functional/projectquota/projectspace_004_pos (run as root) [00:05] [FAIL]
Test: /opt/zfs-tests/tests/functional/scrub_mirror/setup (run as root) [00:00] [FAIL]
Test: /opt/zfs-tests/tests/functional/write_dirs/setup (run as root) [00:00] [FAIL]

A number of these are known failures. I re-ran all of the 'new' failures again and here are the results:

kkantor@oi:/opt/zfs-tests$ ./bin/zfstest -c ./kkantor.run
Test: /opt/zfs-tests/tests/functional/cli_root/zfs_mount/setup (run as root) [00:00] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_mount_encrypted (run as root) [00:03] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zfs_mount/cleanup (run as root) [00:00] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/setup (run as root) [00:12] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_missing_001_pos (run as root) [01:07] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_missing_002_pos (run as root) [02:11] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_missing_003_pos (run as root) [05:21] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_rename_001_pos (run as root) [00:12] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/import_cachefile_device_added (run as root) [00:22] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/import_cachefile_device_removed (run as root) [00:30] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/import_cachefile_mirror_attached (run as root) [00:12] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/import_cachefile_mirror_detached (run as root) [00:14] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/import_cachefile_device_replaced (run as root) [01:09] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/import_paths_changed (run as root) [00:14] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_encrypted (run as root) [00:03] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_encrypted_load (run as root) [00:05] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_import/cleanup (run as root) [00:04] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_labelclear/zpool_labelclear_exported (run as root) [00:13] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_resilver/setup (run as root) [00:01] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_resilver/zpool_resilver_restart (run as root) [00:27] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zpool_resilver/cleanup (run as root) [00:00] [PASS]

Results Summary
PASS 21

Running Time: 00:12:52
Percent passed: 100.0%
Log directory: /var/tmp/test_results/20191018T160738

---
It appears that this is the usual inconsistency of the zfs test suite.

#6

Updated by Robert Mustacchi about 1 month ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

git commit 233f6c49954dadfb21fa0809febd15e2160e0ff5

commit 233f6c49954dadfb21fa0809febd15e2160e0ff5
Author: Kody Kantor <kody@kkantor.com>
Date:   Tue Oct 1 18:55:17 2019 +0000

    disabled resilver_defer feature leads to looping resilvers
    Reviewed by: Chris Dunlop <chris@onthe.net.au>
    Reviewed by: George Melikov <mail@gmelikov.ru>
    Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed by: Tom Caputi <tcaputi@datto.com>
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Andy Fiddaman <andy@omniosce.org>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Andy Stormont <astormont@racktopsystems.com>
    Approved by: Robert Mustacchi <rm@fingolfin.org>

Also available in: Atom PDF