Project

General

Profile

Actions

Bug #3960

closed

dsl_scan can skip over dedup-ed blocks if physical birth != logical birth

Added by Christopher Siden almost 10 years ago. Updated almost 10 years ago.

Status:
Closed
Priority:
Normal
Category:
zfs - Zettabyte File System
Start date:
2013-08-02
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:
External Bug:

Description

Analysis by George Wilson:

The dsl_scan code uses blk_birth directly which is fine for non-dedup-ed but
for dedup-ed blocks it should be using BP_PHYSICAL_BIRTH() instead.

Here's a typical error:

Executing zdb -bccsv -U /rpool/tmp/zpool.cache ztest

Traversing all blocks to verify checksums and verify nothing leaked ...

zdb_blkptr_cb: Got error 50 reading <48, 55, 0, 4000000000b00>
DVA[0]=<0:569b800:200> [L0 other uint64[]] sha256 uncompressed LE contiguous
dedup single size=200L/200P birth=598L/590P fill=1
cksum=471be6558b665e4f:6dd49f1184814d14:91b0315d466beea7:68c153cc5500c836 --
skipping

Error counts:

    errno  count
       50  1

Notice that the block as a logical birth of 598 but a physical birth of 590.
The pool configuration looks like this:

                            capacity   operations   bandwidth  ---- errors ----
description                used avail  read write  read write  read write cksum
ztest                     21.8M  215M 27.9K     0 40.6M     0     0     0     2
  /rpool/tmp/ztest.0b     21.8M  215M 27.9K     0 40.6M     0     0     0     2

Looking back through the logs we find that a scan was invoked at tag 596:

txg 596 scan setup func=2 mintxg=3 maxtxg=596
doing scan sync txg 596; ddt bm=0/0/0/0
dsl_scan_ddt ddb_class 0

This would end up skipping blocks that are newer than txg 596 and thus skipping
the block referenced in the checksum error.

Actions

Also available in: Atom PDF