Project

General

Profile

Bug #3960

dsl_scan can skip over dedup-ed blocks if physical birth != logical birth

Added by Christopher Siden over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Category:
zfs - Zettabyte File System
Start date:
2013-08-02
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Analysis by George Wilson:

The dsl_scan code uses blk_birth directly which is fine for non-dedup-ed but
for dedup-ed blocks it should be using BP_PHYSICAL_BIRTH() instead.

Here's a typical error:

Executing zdb -bccsv -U /rpool/tmp/zpool.cache ztest

Traversing all blocks to verify checksums and verify nothing leaked ...

zdb_blkptr_cb: Got error 50 reading <48, 55, 0, 4000000000b00>
DVA[0]=<0:569b800:200> [L0 other uint64[]] sha256 uncompressed LE contiguous
dedup single size=200L/200P birth=598L/590P fill=1
cksum=471be6558b665e4f:6dd49f1184814d14:91b0315d466beea7:68c153cc5500c836 --
skipping

Error counts:

    errno  count
       50  1

Notice that the block as a logical birth of 598 but a physical birth of 590.
The pool configuration looks like this:

                            capacity   operations   bandwidth  ---- errors ----
description                used avail  read write  read write  read write cksum
ztest                     21.8M  215M 27.9K     0 40.6M     0     0     0     2
  /rpool/tmp/ztest.0b     21.8M  215M 27.9K     0 40.6M     0     0     0     2

Looking back through the logs we find that a scan was invoked at tag 596:

txg 596 scan setup func=2 mintxg=3 maxtxg=596
doing scan sync txg 596; ddt bm=0/0/0/0
dsl_scan_ddt ddb_class 0

This would end up skipping blocks that are newer than txg 596 and thus skipping
the block referenced in the checksum error.

History

#1

Updated by Christopher Siden over 6 years ago

  • Status changed from In Progress to Closed
commit b4952e1
Author: George Wilson <george.wilson@delphix.com>
Date:   Wed Aug 7 13:16:22 2013

    3956 ::vdev -r should work with pipelines
    3957 ztest should update the cachefile before killing itself
    3958 multiple scans can lead to partial resilvering
    3959 ddt entries are not always resilvered
    3960 dsl_scan can skip over dedup-ed blocks if physical birth != logical birth
    3961 freed gang blocks are not resilvered and can cause pool to suspend
    3962 ztest should print out zfs debug buffer before exiting
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Adam Leventhal <ahl@delphix.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Also available in: Atom PDF