Bug #3693


restore_object uses at least two transactions to restore an object

Added by Andriy Gapon over 9 years ago. Updated almost 8 years ago.

zfs - Zettabyte File System
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


restore_object uses at least two transactions to restore an object:
  • one transaction is used for dmu_object_claim
  • another transaction is used to set compression, checksum and most importantly bonus data
  • furthermore dmu_object_reclaim internally uses multiple transactions
  • dmu_free_long_range frees chunks in separate transactions
  • dnode_reallocate is executed in a distinct transaction

The fact the dnode_allocate/ dnode_reallocate are executed in one transaction and bonus (re-)population is executed in a different transaction may lead to violation of ZFS consistency assertions if the transactions are assigned to different transaction groups.
Also, if the first transaction group is successfully written to a permanent storage, but the second transaction is lost, then an invalid dnode may be created on the stable storage.


zfs_restore_object.patch (4.08 KB) zfs_restore_object.patch Andriy Gapon, 2013-04-04 02:45 PM
Actions #1

Updated by Andriy Gapon over 9 years ago

Attaching a patch that seems (after extensive testing) to fix the problem.
Fabian Keil <> has originally made a good report about the problem and then he has helped to debug it and tested patches.

Actions #2

Updated by Matthew Ahrens almost 8 years ago

  • Assignee set to Matthew Ahrens

The attached patch is not quite right.

The issue with the patch is that In dmu_object_reclaim(), you've
changed the dmu_free_long_range() to a dnode_free_range(). This will cause a
problem is the object is very large. I think we'll need to do the
dmu_free_long_range() from restore_object(), before creating the transaction.

I've just looked again at the dmu_object_reclaim code and it seems that
splitting off of the dmu_free_long_range code requires either inlining a good
chunk of dmu_object_reclaim into restore_object or creating a new function that
would duplicate a bit of code from dmu_object_reclaim. I mean the calculations
and checks that precede dmu_free_long_range call.

I don't think you'd have to replicate all 30 lines of code from
dmu_object_reclaim(). You would just need to check if the reclaim() is really
needed, based if the types, block sizes, bonus type and bonus lengths all
match. And we can get this from dmu_object_info(). Essentially,
restore_object() will figure out if this is indeed a different generation of
this object #, and if so call dmu_free_long_range() and then

In other words, you need to handle this case in restore_object:
if (dn->dn_type ot && dn->dn_datablksz blocksize &&
dn->dn_bonustype bonustype && dn->dn_bonuslen bonuslen) {
/* nothing is changing, this is a noop */
dnode_rele(dn, FTAG);
return (0);

But you don't need to to the other stuff, like the nblkptr ||datablksz check,
which is just an unnecessary optimization. The end result might actually be
less code as dmu_object_reclaim() won't need to compute nblkptr and do the
free_long_range() anymore.

Actions #3

Updated by Electric Monk almost 8 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit e77d42eaa49fe55bfae1e0e0065c6e99affc001b

commit  e77d42eaa49fe55bfae1e0e0065c6e99affc001b
Author: Matthew Ahrens <>
Date:   2014-10-08T15:32:30.000Z

    3693 restore_object uses at least two transactions to restore an object
    Reviewed by: Christopher Siden <>
    Reviewed by: George Wilson <>
    Reviewed by: Andriy Gapon <>
    Approved by: Robert Mustacchi <>


Also available in: Atom PDF