Project

General

Profile

Bug #3949

ztest fault injection should avoid resilvering devices

Added by Christopher Siden about 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Category:
zfs - Zettabyte File System
Start date:
2013-08-01
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Analysis by George Wilson:

I've seen several instances where the ztest deadman timer kicks in. Looking at
the core file I see that the spa has been suspended and we're in a state like
this:

> ::spa -v
ADDR         STATE NAME                                                        
10c2a000    ACTIVE ztest

    ADDR     STATE     AUX          DESCRIPTION                                
    082fd640 DEGRADED  -            root
    082fb500 DEGRADED  -              replacing
    082fd180 CANT_OPEN CORRUPT_DATA     /tank/spacemap/ztest/ztest.0a
    082fb9c0 HEALTHY   -                /tank/spacemap/ztest/ztest.0b
    17334300 HEALTHY   -              /tank/spacemap/ztest/ztest.1a

Unfortunately at the same time that we inject an error we also did a spa
reguid. I suspect that eventually the pool suspended since it could not satisfy
a read from top-level vdev 0 and when we try to resume the pool we end up with
the state you see above.

To solve the first part of this problem, we should avoid performing certain
fault injection on devices which still have missing DTLs.

In ztest_fault_inject() we periodically change the leaf vdev's state to close,
cant_write, or cant_read. This gets done on the first child of the top-level.
If that top-level is in the middle of a resilver then we end up potentially
losing the only copy of good data.

History

#1

Updated by Christopher Siden about 6 years ago

  • Status changed from In Progress to Closed
commit 2c1e2b4
Author: George Wilson <george.wilson@delphix.com>
Date:   Wed Aug 7 11:24:34 2013

    3949 ztest fault injection should avoid resilvering devices
    3950 ztest: deadman fires when we're doing a scan
    3951 ztest hang when running dedup test
    3952 ztest: ztest_reguid test and ztest_fault_inject don't place nice together
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Adam Leventhal <ahl@delphix.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Also available in: Atom PDF