Project

General

Profile

Bug #3552

condensing one space map burns 3 seconds of CPU in spa_sync() thread

Added by Christopher Siden over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Category:
zfs - Zettabyte File System
Start date:
2013-02-12
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

This is a performance problem we saw on heavily fragmented pools. Matt Ahrens's description of the problem after investigating it on a machine called 'dcenter':

Condensing a single space map (in metaslab_sync()) (with ~30,000 entries, which
we saw typically on dcenter) takes around 3 seconds.  This is all CPU
time, no i/o.  This happens from the spa_sync() thread, so during these 3
seconds, we are not doing any writes, thus decreasing our write bandwidth.  On
dcenter (which has sync=disabled), we saw a condense happening about 3 times
per minute, so this is a considerable performance impact.

The time is due to creating the "perfect" AVL tree for the space map, which
reflects all current & future allocations.  Most of the time is from
"space_map_walk(sm, space_map_remove, allocmap)", because the sm is large
(~30,000 entries).  The other space_maps (e.g. freed_map, defermap) that are
also removed from the allocmap are pretty small.

We can do this much more quickly by not insisting on a "perfect" AVL tree.  The
on-disk space map is essentially an array of "allocate" and "free" records. 
When it is read in (loaded), we must regenerate the AVL tree.  So we can take
advantage of this by simply writing the records from each of the AVL trees that
we already have on hand.  In pseudocode:

 - first record indicates that entire metaslab is allocated
 - then write each record in the in-core map (sm) as being freed
 - then write each record in the freed_map to indicate what's been freed this
txg
 - then write each record in the defermap
 - then write allocations from the future

This space map will be slightly less compact than the "perfect" one we are
writing now, because we have not consolidated all of these AVL trees into one. 
But the vast majority of the entries are in "sm", so this effect is minimal.

History

#1

Updated by Christopher Siden over 6 years ago

  • Status changed from In Progress to Closed
commit 16a4a80
Author: George Wilson <george.wilson@delphix.com>
Date:   Tue Feb 19 14:15:41 2013

    3552 condensing one space map burns 3 seconds of CPU in spa_sync() thread
    3564 spa_sync() spends 5-10% of its time in metaslab_sync() (when not condensing)
    3566 recursive mutex panic in specfs
    Reviewed by: Adam Leventhal <ahl@delphix.com>
    Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Also available in: Atom PDF