Project

General

Profile

Bug #9617

too-frequent TXG sync causes excessive write inflation

Added by Brad Lewis about 1 year ago. Updated 12 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2018-06-18
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

ZFS starts syncing a TXG if there's 64MB of dirty data. This can result in very short TXG syncs, which can be inefficient because the ratio of metadata to user data is poor. Typically, this is not a real performance problem, because it was thought to only happen under very light workloads. As the (write) workload increases, the TXG sync time will increase, leading to efficiency. However, when the workload is almost entirely sync writes, especially those that use dmu_sync() to write the user data to its final resting place from open context, we can have very short TXG's even under moderate workloads, because spa_sync() isn't writing the actual user data.

The problem is not the short txg's per se, but rather the high frequency of pushing out txg's, caused by the low 64MB trigger for starting a sync. Conceptually, for a given workload, each TXG is going to have a fixed amount of overhead (in terms of MB, IOPS, or time taken to write it) that is only loosely coupled to the amount of dirty data (i.e. frequency of TXG sync). Therefore, decreasing TXG sync frequency will decrease the overhead per unit time.

The solution is to increase the amount of dirty data allowed before pushing out a TXG, from 64MB to 20% of zfs_dirty_data_max (820MB if total RAM >= 40GB). When we did this on the customer system in ESCL-658, overall write inflation by bandwidth decreased from ~5.5x to 3x.

History

#1

Updated by Matthew Ahrens 12 months ago

This will cause more batching of i/o, so it shouldn't decrease performance in any workload. Besides the customer system, we've also run the zfs performance test suite and there was no change.

#2

Updated by Electric Monk 12 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 7928f4baf4ab3230557eb6289be68aa7a3003f38

commit  7928f4baf4ab3230557eb6289be68aa7a3003f38
Author: Matthew Ahrens <mahrens@delphix.com>
Date:   2018-09-18T20:53:10.000Z

    9617 too-frequent TXG sync causes excessive write inflation
    Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
    Reviewed by: Brad Lewis <brad.lewis@delphix.com>
    Reviewed by: George Wilson <george.wilson@delphix.com>
    Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
    Approved by: Robert Mustacchi <rm@joyent.com>

Also available in: Atom PDF