condensing one space map burns 3 seconds of CPU in spa_sync() thread
zfs - Zettabyte File System
This is a performance problem we saw on heavily fragmented pools. Matt Ahrens's description of the problem after investigating it on a machine called 'dcenter':
Condensing a single space map (in metaslab_sync()) (with ~30,000 entries, which we saw typically on dcenter) takes around 3 seconds. This is all CPU time, no i/o. This happens from the spa_sync() thread, so during these 3 seconds, we are not doing any writes, thus decreasing our write bandwidth. On dcenter (which has sync=disabled), we saw a condense happening about 3 times per minute, so this is a considerable performance impact. The time is due to creating the "perfect" AVL tree for the space map, which reflects all current & future allocations. Most of the time is from "space_map_walk(sm, space_map_remove, allocmap)", because the sm is large (~30,000 entries). The other space_maps (e.g. freed_map, defermap) that are also removed from the allocmap are pretty small. We can do this much more quickly by not insisting on a "perfect" AVL tree. The on-disk space map is essentially an array of "allocate" and "free" records. When it is read in (loaded), we must regenerate the AVL tree. So we can take advantage of this by simply writing the records from each of the AVL trees that we already have on hand. In pseudocode: - first record indicates that entire metaslab is allocated - then write each record in the in-core map (sm) as being freed - then write each record in the freed_map to indicate what's been freed this txg - then write each record in the defermap - then write allocations from the future This space map will be slightly less compact than the "perfect" one we are writing now, because we have not consolidated all of these AVL trees into one. But the vast majority of the entries are in "sm", so this effect is minimal.
Updated by Christopher Siden over 6 years ago
- Status changed from In Progress to Closed
commit 16a4a80 Author: George Wilson <email@example.com> Date: Tue Feb 19 14:15:41 2013 3552 condensing one space map burns 3 seconds of CPU in spa_sync() thread 3564 spa_sync() spends 5-10% of its time in metaslab_sync() (when not condensing) 3566 recursive mutex panic in specfs Reviewed by: Adam Leventhal <firstname.lastname@example.org> Reviewed by: Dan Kimmel <email@example.com> Reviewed by: Matthew Ahrens <firstname.lastname@example.org> Approved by: Richard Lowe <email@example.com>