Bug #4881


zfs send performance degradation when embedded block pointers are encountered

Added by Christopher Siden over 9 years ago. Updated over 9 years ago.

zfs - Zettabyte File System
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:
External Bug:


Matt Ahrens:

As George noted below, the problem is that the traverse_prefetch_thread is able
to get way ahead of the dmu_send() thread (we saw it get 2 million blocks
ahead).  When this happens, the data that was prefetched can be evicted from
the cache (ARC) before the dmu_send() thread accesses it.  This decreases
performance because we lose the benefit of prefetching (i.e. having multiple
i/os outstanding concurrently) and we end up reading every block from disk
twice (once by the prefetch and once when we actually need it).

The prefetch thread is supposed to be at most zfs_pd_blks_max (default 100)
blocks ahead of the dmu_send thread.  This relies on the prefetch and dmu_send
threads coordinating how many blocks have been "produced (prefetched) but not
yet consumed (accessed)" in prefetch_data_t:pd_blks_fetched.

When the prefetch thread visits embedded block pointers (blocks whose data
compresses down to <100 bytes and thus fits inside the block pointer itself),
it does not increment pd_blks_fetched (i.e. it does not "produce" a prefetched
block), because no i/o is needed to "read" the block.  (See
traverse_prefetcher().)  However, when the zfs_send() thread visits an embedded
block pointer, it decrements pd_blks_fetched, thinking that it has "consumed" a
prefetched block.  (See traverse_visitbp().)  These extra decrements cause
zfs_send() thread get one block further behind for every embedded block that is
visited (waiting if necessary so that the count doesn't go negative).

The fix is to move the "do we count this block as a prefetch" logic into a
common function, so that the logic is exactly the same in the producer
(traverse_prefetcher()) and consumer (traverse_visitbp()).

When we're in this bad state, we will see zio_read() being called by the
dmu_send() thread.  Normally this should not happen, because everything it
needs will have already been prefetched.  (Note that if the disks are slower
than the network, the dmu_send() thread can still need to wait for i/o to
complete, but it will do by waiting for the already-issued prefetch i/o, via an
interlock in the ARC.)  Therefore we can detect that we've gotten into this
situation by seeing if zio_read() is called from dmu_send():

  dpp -n 'zio_read:entry/callers["dmu_send"]/{@=count()}' -c 'zfs send ...'

We can also observe how far the prefetch thread is ahead of the main thread
(results only make sense when doing a non-incremental send of a large,
non-sparse, file).  The main and prefetch threads blkids should not be further
apart than the number of blocks "fetched":

dpp -q -n 'traverse_visitbp:entry
  if (args[0]->td_pfd) {
    printf("main thread: blkid=%u fetched=%u\\n",
      args[3]->zb_blkid, args[0]->td_pfd->pd_blks_fetched);
  } else {
    printf("prefetch thread: blkid=%u\\n", args[3]->zb_blkid);

Note that the bug will be triggered (i.e. the prefetch thread will get further
ahead than designed) whenever sending a dataset with embedded block pointers. 
I.e. blocks that compress very well (controlled by the feature@embedded_data
pool property).  However,
performance will still be fine unless there are more of these embedded block
pointers than there is memory available to the ARC.  Thus the problem would not
be noticed with small-scale tests.

Related issues

Follows illumos gate - Feature #4757: ZFS embedded-data block pointers ("zero block compression")ClosedChristopher Siden2014-04-14

Actions #1

Updated by Electric Monk over 9 years ago

  • Status changed from In Progress to Closed

git commit 06315b795c0d54f0228e0b8af497a28752dd92da

commit  06315b795c0d54f0228e0b8af497a28752dd92da
Author: Matthew Ahrens <>
Date:   2014-06-05T21:34:21.000Z

    4881 zfs send performance degradation when embedded block pointers are encountered
    Reviewed by: George Wilson <>
    Reviewed by: Christopher Siden <>
    Approved by: Dan McDonald <>


Also available in: Atom PDF