Bug #3958


multiple scans can lead to partial resilvering

Added by Christopher Siden over 10 years ago. Updated over 10 years ago.

zfs - Zettabyte File System
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:
External Bug:


Analysis by George Wilson:

I've been looking at many zloop/ztest failures that involve checksum errors.
The scenario typically involves a pool that is mirrored, spared, or devices
that have been replaced.

It appears that during the resilver operation we are failing to copy over all
the data from the original device. The device is later detached leaving a
partially resilvered vdev to service the pool. The end result is everything
from suspended pools, to random assertions, failures to open the pool, and the
list goes on.

When a device is attached (or replaced) the spa code will create a DTL that
indicates that newly attached device does not contain any contents between the
specified txg range.  The range must contain the currently open txg to ensure
that dmu_sync-ed blocks are also resilvered. After setting the DTL it calls
into the DSL to notify any running scan that it should restart and passes the
txg for when the restart should occur. The issue is that the dsl is told to
restart the scan when the open context txg is synced this creates a race when
multiple scan are running. Imagine this scenario:

A pool configuration that looks like this:

    ADDR     STATE     AUX          DESCRIPTION                                
    080b9800 HEALTHY   -            root
    08c27a00 HEALTHY   -              mirror
    080ba640 HEALTHY   -                /rpool/tmp/ztest.0b
    08c29b40 HEALTHY   -                /rpool/tmp/ztest.0a
    0a1087c0 HEALTHY   -              mirror
    080b9cc0 HEALTHY   -                /rpool/tmp/ztest.1a
    0a109140 HEALTHY   -                /rpool/tmp/ztest.1b

In this case we have 2 devices, ztest.0a and ztest.1b, that are both in need of
resilvering. The ztest.0a device was attached first at txg 850 and kicks off a
resilver scan. At txg 857 ztest.1b is attached and the resilver is scheduled to
restart at txg 860 but before this happens the first scan completes at txg 858.
This is when things go horribly wrong.

As a result of the scan completing at txg 858 we look at all the vdevs and
their DTLs. Any DTL that indicates that it is missing txgs in the range of 3 to
358 will now be cleared since we finished the scan. This is fine for device
ztest.0a but for device ztest.1b that was newly attached it means that most of
the blocks have not been resilvered. Later when txg 860 syncs it will start the
scan but that will only try to resilver blocks from (859, 860].

To resolve this we only excise the txgs on devices that were part of the entire
scan. Any device that was added after the scan was started will not have its
DTLs updated when the current running scan completes.

Actions #1

Updated by Christopher Siden over 10 years ago

  • Status changed from In Progress to Closed
commit b4952e1
Author: George Wilson <>
Date:   Wed Aug 7 13:16:22 2013

    3956 ::vdev -r should work with pipelines
    3957 ztest should update the cachefile before killing itself
    3958 multiple scans can lead to partial resilvering
    3959 ddt entries are not always resilvered
    3960 dsl_scan can skip over dedup-ed blocks if physical birth != logical birth
    3961 freed gang blocks are not resilvered and can cause pool to suspend
    3962 ztest should print out zfs debug buffer before exiting
    Reviewed by: Matthew Ahrens <>
    Reviewed by: Adam Leventhal <>
    Approved by: Richard Lowe <>

Also available in: Atom PDF