Project

General

Profile

Bug #9318

vol_volsize_to_reservation does not account for raidz skip blocks

Added by Mike Gerdts over 1 year ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
zfs - Zettabyte File System
Start date:
2018-03-19
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

While discussing the fix for Issue #9286, Richard Elling pointed out that the calculated reservation is sometimes too small:

Sure, raidz skip blocks are not accounted for. In part this is logically due to skip blocks being assigned at the SPA layer and reservations are at the DSL layer. The pathological example is raidz2 on 4kn disks with volblocksize=8k (default). The predicted reservation is 8k per block (logical) plus 8k parity = 16k, but the actual allocated space is 24k. The DSL "free" space assumes 16k so it overestimates the usable space. Thus you can run out of allocated space in the pool before hitting refreservation -- a bad thing.One way to innoculate is to increase refreservation to a value greater than volsize.

https://openzfs.topicbox.com/groups/developer/Te3d593ba00521b6d-M0dfed5d7451e4ea21218fff2


Files

test-results-512-3disks.txt (7.77 KB) test-results-512-3disks.txt Mike Gerdts, 2019-06-21 09:28 PM
test-results-512-20disks.txt (37 KB) test-results-512-20disks.txt Mike Gerdts, 2019-06-21 09:28 PM
test-results-4096-3disks.txt (7.78 KB) test-results-4096-3disks.txt Mike Gerdts, 2019-06-21 09:28 PM
test-results-4096-20disks.txt (37 KB) test-results-4096-20disks.txt Mike Gerdts, 2019-06-21 09:28 PM
test-results.full.diff (4.22 KB) test-results.full.diff Mike Gerdts, 2019-06-21 09:28 PM
test-results-baseline-full.txt (75.5 KB) test-results-baseline-full.txt Mike Gerdts, 2019-06-21 09:28 PM
test-results-fix-full.txt (75.1 KB) test-results-fix-full.txt Mike Gerdts, 2019-06-21 09:28 PM
new_test_failures.md (8.7 KB) new_test_failures.md Mike Gerdts, 2019-06-21 09:36 PM
omnios.run.patch (2.42 KB) omnios.run.patch Mike Gerdts, 2019-06-21 09:38 PM

History

#1

Updated by Mike Gerdts about 1 month ago

It's not just the skip blocks, small block sizes lead to more parity blocks than are found in a 128k block. referenced is scaled such that logicalreferenced == referenced (ignoring metadata) for typical (128k) blocks. When there are more (generally volblocksize or recordsize < 128k) or fewer sectors (generally recordsize > 128k) required to store the requested block size, referenced becomes relatively larger or smaller than logicalreferenced.

#2

Updated by Dan McDonald about 1 month ago

  • Category set to zfs - Zettabyte File System
  • Assignee set to Mike Gerdts
#3

Updated by Mike Gerdts about 1 month ago

  • Status changed from New to In Progress
  • Tags deleted (needs-triage)
#4

Updated by Mike Gerdts about 1 month ago

Testing...

While running baseline tests, I repeatedly hit zfs-on-zfs deadlock in some tests. I disabled those tests, as they were unrelated to the area where I was making changes. See omnios.run.patch.

Baseline tests:

test-results-baseline-full.txt is a summary of the results run on OmniOS bloody with 20 512n disks (isci luns from another vm)

Tests with fixes:

test-results-fix-full.txt is a summary of the results run with the fix with all but the last round of code review feedback. I accidentally ran this with 4k sectors leading to some new ENOSPC issues in tests that were not related to reservations or raidz. There were also some problems with tests that expected SMI labels instead of EFI labels - another area that is unrelated to my change. See new_test_failures.md for details.

The last round of code review feedback involved some changes to the tests and a comment change in libzfs_dataset.c. After the last round of code review, I ran the new tests with 3 disks, 20 disks, 512 byte sectors, and 4k sectors. See test-results-{512,4096}-{3,20}disks.txt. All new tests passed.

#5

Updated by Electric Monk about 1 month ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit b73ccab03ec36581b1ae5945ef1fee1d06c79ccf

commit  b73ccab03ec36581b1ae5945ef1fee1d06c79ccf
Author: Mike Gerdts <mike.gerdts@joyent.com>
Date:   2019-06-21T22:03:31.000Z

    9318 vol_volsize_to_reservation does not account for raidz skip blocks
    Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
    Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Matt Ahrens <matt@delphix.com>
    Reviewed by: Kody Kantor <kody.kantor@joyent.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF