Bug #7163

ztest failures due to excess error injection

Added by Matthew Ahrens about 3 years ago. Updated about 3 years ago.

zfs - Zettabyte File System
Start date:
Due date:
% Done:


Estimated time:


Running zloop from zfs-precommit hit this assertion:


0xfffffd7fd7419370: assertion failed for thread 0xfffffd7fe29ed240,
thread-id 577: parent != NULL, file ../../../uts/common/fs/zfs/dbuf.c, line 1827

$c`_lwp_kill+0xa()`_assfail+0x182(fffffd7ffb1c29fa, fffffd7ffb1cc028, 723)`assfail+0x19(fffffd7ffb1c29fa, fffffd7ffb1cc028, 723)`dbuf_dirty+0xc69(10e3bc10, 3601700)`dbuf_dirty+0x61e(10c73640, 3601700)`dbuf_dirty+0x61e(10e28280, 3601700)`dmu_buf_will_fill+0x64(10e28280, 3601700)`dmu_write+0x1b6(2c7e640, d, 400000002e000000, 200, 3717b40,
ztest_replay_write+0x568(4950d0, 3717a80, 0)
ztest_write+0x125(4950d0, d, 400000002e000000, 200, 413f000)
ztest_io+0x1bb(4950d0, d, 400000002e000000)
ztest_dmu_write_parallel+0xaa(4950d0, 6)
ztest_execute+0x83(1, 420c98, 6)

This is another manifestation of ECKSUM in ztest:
The lowest level ancestor that’s in memory is the L8 (topmost). The L7 ancestor is blkid 0x10:

::dbufs -O 0x2c7e640 -o d -l 7 |::dbuf

addr object lvl blkid holds os
600be50 d 7 4 1 ztest/ds_6
719d880 d 7 0 4 ztest/ds_6


[L7 UINT64_OTHER] SHA256 LZ4 LE contiguous unique double
size=4000L/1000P birth=61L/61P fill=1

::spa -v

0000000001d24000 ACTIVE ztest

ADDR             STATE     AUX          DESCRIPTION
0000000001d54540 DEGRADED - root
0000000001d53c00 DEGRADED - mirror
0000000001d52980 CANT_OPEN BAD_LABEL /var/tmp/os-ztest/ztest.0a
0000000001d532c0 HEALTHY - /var/tmp/os-ztest/ztest.1a

0000000001d53c00::print vdev_t vdev_stat.vs_checksum_errors

vdev_stat.vs_checksum_errors = 0x4

0000000001d24000::print spa_t spa_errlist_scrub | ::walk avl | ::print spa_error_entry_t se_bookmark

se_bookmark = {
se_bookmark.zb_objset = 0x53
se_bookmark.zb_object = 0xd
se_bookmark.zb_level = 0x7
se_bookmark.zb_blkid = 0x10
Yep, that's the block we need, and it got ECKSUM when we tried to read it.

Let’s take another look at the BP that we couldn’t read, and figure out why all 4 copies of it are unreadable.
[L7 UINT64_OTHER] SHA256 LZ4 LE contiguous unique double
size=4000L/1000P birth=61L/61P fill=1

$ zdb -R -e -p vdev 5470655991381798096 0:46d4000:1000 |grep c0ffee
Found vdev type: mirror
0002d0: 1990c0ffeedecade 0000000000000000 ................

$ zdb -R -e -p vdev 5470655991381798096 0:46d5000:1000 |grep c0ffee
Found vdev type: mirror
0008a0: 0000000000000000 1990c0ffeedecade ................
I verified that zdb is also getting CANT_OPEN on ztest.0a, so this is the raw data from ztest.1a that’s corrupt with c0ffee.
Both DVAs have been corrupted by ztest_fault_inject() writing 1990c0ffeedecade. At the time of injection, maxfaults must have been nonzero, which makes sense since this is a mirror. It doesn’t check if a side of the mirror is CANT_OPEN, because ztest doesn’t try to damage a whole device when it is testing a 2-way mirror.
How did ztest.0a get its labels damaged? “zdb -l” shows that there’s 1 label intact. What happened to the other 3 labels, and if we have one remaining, why is the device CANT_OPEN?
I checked with od, and all 4 labels are corrupt with 1990c0ffeedecade. However, zdb can dump label 1 because it doesn’t verify the checksum in the label.
Why did zdb pour c0ffee over all 4 labels? Let’s look at ztest_fault_inject():
int bshift = SPA_MAXBLOCKSHIFT + 2; /* don't scrog all labels */

offset = ztest_random(fsize / (leaves << bshift)) *
(leaves << bshift) + (leaf << bshift) +
(ztest_random(1ULL << (bshift - 1)) & -8ULL);
The comment implies that this value of bshift will keep us from overwriting all the labels. This was true when the max block size was 128KB, but is no longer true now that the max block size is 16MB. See the comment near the bottom of ztest_fault_inject() for an explanation of what regions are eligible for corruption. Here are examples with 2-way mirroring:
Note: sizeof (vdev_label_t) is 256KB

with leaves=2, MAXBLOCKSIZE=128KB:
chunk size = 2 * 128KB * 4 = 1MB
divide into 4 regions of 256KB each:
0-256K: inject leaf 0 [covers label 0]
256-512K: DMZ - no injection [covers label 1]
512-768K: inject leaf 1
768-1024K: DMZ - no injection

with leaves=2, MAXBLICKSIZE=16MB:
chunk size = 2 * 16MB * 4 = 128MB
divide into 4 regions of 32MB each:
0-32MB: inject leaf 0 [covers labels 0 and 1]
32-64MB: DMZ - no injection
64-96MB: inject leaf 1
96-128MB: DMZ - no injection
This bug was introduced by the fix for “DLPX-40543 ztest fails due to checksum errors”, which changed bshift from using SPA_OLD_MAXBLOCKSHIFT (128KB) to SPA_MAXBLOCKSHIFT (16MB), to fix a different “too much injection” problem.
The fix is to find a different method to ensure that we don’t caffeinate all 4 labels with 0x1990c0ffeedecade.



Updated by Electric Monk about 3 years ago

  • % Done changed from 0 to 100
  • Status changed from New to Closed

git commit f34284d835bc555f987c1310df46c034c3101155

commit  f34284d835bc555f987c1310df46c034c3101155
Author: Matthew Ahrens <>
Date:   2016-07-19T23:33:17.000Z

    7163 ztest failures due to excess error injection
    Reviewed by: George Wilson <>
    Reviewed by: Paul Dagnelie <>
    Approved by: Robert Mustacchi <>

Also available in: Atom PDF