spa_sync() spends half its time in dmu_objset_do_userquota_updates
Using a benchmark which creates 2 million files in one TXG, I observe that the thread running `spa_sync()` is on CPU almost the entire time we are syncing, and therefore can be a performance bottleneck. About 50% of the time in `spa_sync()` is in `dmu_objset_do_userquota_updates()`.
The problem is that `dmu_objset_do_userquota_updates()` calls `zap_increment_int(DMU_USERUSED_OBJECT)` once for every file that was modified (or created). In this benchmark, all the files are owned by the same user/group, so all 2 million calls to `zap_increment_int()` are modifying the same entry in the zap. The same issue exists for the DMU_GROUPUSED_OBJECT.
A simple solution to reduce the number of calls to `zap_increment_int()` would be to cache the space delta for the “current” user and group, and only call `zap_increment_int()` when the “current” user/group changes. In my benchmark, all files are owned by the same user/group, so this reduces the number of calls from 2,000,000 to 1.
However, for other workloads, the simple solution provides no help. For example, if the dirty files alternate between owners. In this case a slightly more involved solution is required. We should keep an in-memory map from user to space delta while we are syncing, and when we finish, iterate over the in-memory map and modify the ZAP once per entry. This reduces the number of calls to `zap_increment_int()` from “number of objects modified” to “number of owners/groups of modified files”.
A prototype of the simple solution reduced the time spent in spa_sync() by ~33%, from 11 seconds to 7 seconds.
The equivalent ZFSonLinux bug is https://github.com/zfsonlinux/zfs/issues/4642
Updated by Electric Monk over 4 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
commit 8b70546b0a686e4eee3d28878728f3b3ce980b06 Author: Matthew Ahrens <email@example.com> Date: 2016-09-28T18:03:33.000Z 6988 spa_sync() spends half its time in dmu_objset_do_userquota_updates Reviewed by: George Wilson <firstname.lastname@example.org> Reviewed by: Steve Gonczi <email@example.com> Reviewed by: Ned Bass <firstname.lastname@example.org> Reviewed by: Jinshan Xiong <email@example.com> Approved by: Richard Lowe <firstname.lastname@example.org>