ztest fault injection should avoid resilvering devices
zfs - Zettabyte File System
Analysis by George Wilson:
I've seen several instances where the ztest deadman timer kicks in. Looking at the core file I see that the spa has been suspended and we're in a state like this: > ::spa -v ADDR STATE NAME 10c2a000 ACTIVE ztest ADDR STATE AUX DESCRIPTION 082fd640 DEGRADED - root 082fb500 DEGRADED - replacing 082fd180 CANT_OPEN CORRUPT_DATA /tank/spacemap/ztest/ztest.0a 082fb9c0 HEALTHY - /tank/spacemap/ztest/ztest.0b 17334300 HEALTHY - /tank/spacemap/ztest/ztest.1a Unfortunately at the same time that we inject an error we also did a spa reguid. I suspect that eventually the pool suspended since it could not satisfy a read from top-level vdev 0 and when we try to resume the pool we end up with the state you see above. To solve the first part of this problem, we should avoid performing certain fault injection on devices which still have missing DTLs.
In ztest_fault_inject() we periodically change the leaf vdev's state to close, cant_write, or cant_read. This gets done on the first child of the top-level. If that top-level is in the middle of a resilver then we end up potentially losing the only copy of good data.