spa_sync() spends 10-20% of its time in spa_free_sync_cb()
This bug is based on performance problems observed on a heavily loaded server we are running at Delphix. Matt Ahrens's original bug report stated:
When we free a block, we first put the block on a linked list for processing at a specific point in spa_sync(). We then call spa_free_sync_cb() on each block to process the free. This must add the freed block to the space map's AVL tree. We have seen these trees have ~30,000 entries, taking >100 microseconds to add one entry. The end result is that the spa_sync() thread spends 10-20% of its time in spa_free_sync_cb(), during which we are not doing any writes. If we can reduce this time we can improve our write bandwidth by 10-20%.
We can make a big improvement by processing the frees in parallel. Each free is to a specific metaslab, and each metaslab has its own locking, so we can process frees to different metaslabs in parallel. We can use the zio pipeline to accomplish this.
George Wilson followed up with:
When looking into the data it was clear that frees were being handled in the context of the sync thread. What's interesting is that the code has been written with the assumption that the free I/Os could happen in parallel. Historically we used to only have a single FREE taskq thread and I'm assuming that we never issued them asynchronously because there was never any I/O to perform (only memory operations).
The fix for this is simply to issue all free I/Os asynchronously by adding the ASYNC stage into the free pipeline. The FREE taskqs have 100 threads to perform the work so we should see a significant amount of parallel activity.
Updated by Christopher Siden over 8 years ago
- Status changed from In Progress to Closed
commit 01f55e48fb4d524eaf70687728aa51b7762e2e97 Author: George Wilson <firstname.lastname@example.org> Date: Wed Nov 7 14:05:17 2012 -0800 3329 spa_sync() spends 10-20% of its time in spa_free_sync_cb() 3330 space_seg_t should have its own kmem_cache 3331 deferred frees should happen after sync_pass 1 3335 make SYNC_PASS_* constants tunable Reviewed by: Adam Leventhal <email@example.com> Reviewed by: Matt Ahrens <firstname.lastname@example.org> Reviewed by: Christopher Siden <email@example.com> Reviewed by: Eric Schrock <firstname.lastname@example.org> Reviewed by: Richard Lowe <email@example.com> Reviewed by: Dan McDonald <firstname.lastname@example.org> Approved by: Eric Schrock <email@example.com>