Project

General

Profile

Feature #3970

add tunable for maximum number of blocks freed in one txg

Added by Christopher Siden almost 7 years ago. Updated almost 7 years ago.

Status:
In Progress
Priority:
Normal
Category:
zfs - Zettabyte File System
Start date:
2013-08-02
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

Original problem discussed on mailing list with subject "zfs destroy memory usage / swapping to zvol".

By limiting the maximum number of blocks freed during a single transaction
group, we can limit how much of the dedup table has to be brought into memory.
The e-mail thread referenced below suggests that destroying large numbers of
blocks in a single txg caused ZFS to run out of memory because the entire DDT
was brought into memory.

-------------------------

Originally proposed on the illumos-zfs mailing list, e-mail subject = "zfs
destroy memory usage / swapping to zvol", author = mahrens:

Ideally, we would add a tunable that is "maximum number of blocks to free in
one txg".  Something like this (untested) patch should work; set it to around
100,000 for your workload:

diff --git a/usr/src/uts/common/fs/zfs/dsl_scan.c
b/usr/src/uts/common/fs/zfs/dsl_scan.c
index 87c4925..9ce9806 100644
--- a/usr/src/uts/common/fs/zfs/dsl_scan.c
+++ b/usr/src/uts/common/fs/zfs/dsl_scan.c
@@ -70,6 +70,7 @@ boolean_t zfs_no_scrub_io = B_FALSE; /* set to disable scrub
i/o */
 boolean_t zfs_no_scrub_prefetch = B_FALSE; /* set to disable srub prefetching
*/
 enum ddt_class zfs_scrub_ddt_class_max = DDT_CLASS_DUPLICATE;
 int dsl_scan_delay_completion = B_FALSE; /* set to delay scan completion */
+uint64_t zfs_free_max_blocks = UINT64_MAX; /* set to limit frees per txg */

 #define        DSL_SCAN_IS_SCRUB_RESILVER(scn) \\\\
        ((scn)->scn_phys.scn_func == POOL_SCAN_SCRUB || \\\\
@@ -1319,6 +1320,9 @@ dsl_scan_free_should_pause(dsl_scan_t *scn)
        if (zfs_recover)
                return (B_FALSE);

+       if (scn->scn_visited_this_txg >= zfs_free_max_blocks)
+               return (B_TRUE);
+
        elapsed_nanosecs = gethrtime() - scn->scn_sync_start_time;
        return (elapsed_nanosecs / NANOSEC > zfs_txg_timeout ||
            (NSEC2MSEC(elapsed_nanosecs) > zfs_free_min_time_ms &&

Also available in: Atom PDF