add tunable for maximum number of blocks freed in one txg
zfs - Zettabyte File System
Original problem discussed on email@example.com mailing list with subject "zfs destroy memory usage / swapping to zvol".
By limiting the maximum number of blocks freed during a single transaction group, we can limit how much of the dedup table has to be brought into memory. The e-mail thread referenced below suggests that destroying large numbers of blocks in a single txg caused ZFS to run out of memory because the entire DDT was brought into memory. ------------------------- Originally proposed on the illumos-zfs mailing list, e-mail subject = "zfs destroy memory usage / swapping to zvol", author = mahrens: Ideally, we would add a tunable that is "maximum number of blocks to free in one txg". Something like this (untested) patch should work; set it to around 100,000 for your workload: diff --git a/usr/src/uts/common/fs/zfs/dsl_scan.c b/usr/src/uts/common/fs/zfs/dsl_scan.c index 87c4925..9ce9806 100644 --- a/usr/src/uts/common/fs/zfs/dsl_scan.c +++ b/usr/src/uts/common/fs/zfs/dsl_scan.c @@ -70,6 +70,7 @@ boolean_t zfs_no_scrub_io = B_FALSE; /* set to disable scrub i/o */ boolean_t zfs_no_scrub_prefetch = B_FALSE; /* set to disable srub prefetching */ enum ddt_class zfs_scrub_ddt_class_max = DDT_CLASS_DUPLICATE; int dsl_scan_delay_completion = B_FALSE; /* set to delay scan completion */ +uint64_t zfs_free_max_blocks = UINT64_MAX; /* set to limit frees per txg */ #define DSL_SCAN_IS_SCRUB_RESILVER(scn) \\\\ ((scn)->scn_phys.scn_func == POOL_SCAN_SCRUB || \\\\ @@ -1319,6 +1320,9 @@ dsl_scan_free_should_pause(dsl_scan_t *scn) if (zfs_recover) return (B_FALSE); + if (scn->scn_visited_this_txg >= zfs_free_max_blocks) + return (B_TRUE); + elapsed_nanosecs = gethrtime() - scn->scn_sync_start_time; return (elapsed_nanosecs / NANOSEC > zfs_txg_timeout || (NSEC2MSEC(elapsed_nanosecs) > zfs_free_min_time_ms &&