Project

General

Profile

Bug #11051

zfs miscounts BP_IS_EMBEDDED blocks during scan.

Added by Bill Sommerfeld 4 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
zfs - Zettabyte File System
Start date:
2019-05-19
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

I've got a rather long-lived pool on an illumos system which I just upgraded to include the new sequential-scan fixes to scrub/resilver.

The pool is currently using 880G out of the 1.81T capacity (on 2TB disks).

A scrub in this pool starts normally and runs to completion (in about half the time it used to take) but a few minutes into the scan, the "issued" count suddenly jumps to about 7TB, the "done" percentage jumps to around 800%, and it says "no estimated completion time".

Investigation with a modified zdb showed that this involves the embedded_data pool feature; the miscount was due to looking at DVA_GET_ASIZE on the first DVA in a block pointer where BP_IS_EMBEDDED() was true. For lz4-compressed embedded data, DVA_GET_ASIZE() is aways zero due to a coincidence of data layout, but for other compression algorithms, DVA_GET_ASIZE() on an embedded BP may return a nonzero value.

To reproduce:

create a bunch of very short random files in a filesystem using a compression algorithm other than lz4 (I reproduced the problem with both compress=zle or compress=lzjb; I haven't tried others).

0) use a pool with embedded_data enabled.

1) use a compression algorithm other than lz4:

zfs set compress=lzjb path/to/filesys

2) create a bunch of small files with random content:

cd /path/to/filesys

for f in $(seq 1024)
do
dd if=/dev/urandom of=$f bs=16 count=1
done

3) scrub, watch "issued" value go nuts when the scan reaches the embedded bp's containing the above files.

Bug appears to be in dsl_scan.c's count_block() function, which doesn't check BP_IS_EMBEDDED() before looking at the first DVA in the block pointer for sequential scans.

History

#1

Updated by Jerry Jelinek 4 months ago

  • Assignee set to Jerry Jelinek
#3

Updated by Jerry Jelinek 4 months ago

Before the fix
-----------------
pool: testpool
state: ONLINE
scan: scrub in progress since Mon May 20 17:39:46 2019
599K scanned at 599K/s, 32.0T issued at 2.12G/s, 2.04M total
0 repaired, 1645672391.45% done, no estimated completion time
config:

NAME        STATE     READ WRITE CKSUM
testpool ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0

After the fix
---------------
pool: testpool
state: ONLINE
scan: scrub in progress since Mon May 20 17:52:33 2019
586K scanned at 586K/s, 1.08M issued at 1.08M/s, 2.02M total
0 repaired, 53.54% done, no estimated completion time
config:

NAME        STATE     READ WRITE CKSUM
testpool ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
#4

Updated by Electric Monk 4 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit ee2f9ca4ea24f72b05598c92aad7f42fb77b1345

commit  ee2f9ca4ea24f72b05598c92aad7f42fb77b1345
Author: Bill Sommerfeld <sommerfeld@alum.mit.edu>
Date:   2019-05-21T13:14:02.000Z

    11051 zfs miscounts BP_IS_EMBEDDED blocks during scan.
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Gergo Doma <domag02@gmail.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF