Bug #3693
closedrestore_object uses at least two transactions to restore an object
100%
Description
- one transaction is used for dmu_object_claim
- another transaction is used to set compression, checksum and most importantly bonus data
- furthermore dmu_object_reclaim internally uses multiple transactions
- dmu_free_long_range frees chunks in separate transactions
- dnode_reallocate is executed in a distinct transaction
The fact the dnode_allocate/ dnode_reallocate are executed in one transaction and bonus (re-)population is executed in a different transaction may lead to violation of ZFS consistency assertions if the transactions are assigned to different transaction groups.
Also, if the first transaction group is successfully written to a permanent storage, but the second transaction is lost, then an invalid dnode may be created on the stable storage.
Files
Updated by Andriy Gapon over 9 years ago
- File zfs_restore_object.patch zfs_restore_object.patch added
Attaching a patch that seems (after extensive testing) to fix the problem.
Fabian Keil <fk@fabiankeil.de> has originally made a good report about the problem and then he has helped to debug it and tested patches.
Updated by Matthew Ahrens almost 8 years ago
- Assignee set to Matthew Ahrens
The attached patch is not quite right.
The issue with the patch is that In dmu_object_reclaim(), you've
changed the dmu_free_long_range() to a dnode_free_range(). This will cause a
problem is the object is very large. I think we'll need to do the
dmu_free_long_range() from restore_object(), before creating the transaction.
"
I've just looked again at the dmu_object_reclaim code and it seems that
splitting off of the dmu_free_long_range code requires either inlining a good
chunk of dmu_object_reclaim into restore_object or creating a new function that
would duplicate a bit of code from dmu_object_reclaim. I mean the calculations
and checks that precede dmu_free_long_range call.
"
I don't think you'd have to replicate all 30 lines of code from
dmu_object_reclaim(). You would just need to check if the reclaim() is really
needed, based if the types, block sizes, bonus type and bonus lengths all
match. And we can get this from dmu_object_info(). Essentially,
restore_object() will figure out if this is indeed a different generation of
this object #, and if so call dmu_free_long_range() and then
dmu_object_reclaim().
In other words, you need to handle this case in restore_object:
if (dn->dn_type ot && dn->dn_datablksz blocksize &&
dn->dn_bonustype bonustype && dn->dn_bonuslen bonuslen) {
/* nothing is changing, this is a noop */
dnode_rele(dn, FTAG);
return (0);
}
But you don't need to to the other stuff, like the nblkptr ||datablksz check,
which is just an unnecessary optimization. The end result might actually be
less code as dmu_object_reclaim() won't need to compute nblkptr and do the
free_long_range() anymore.
Updated by Electric Monk almost 8 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit e77d42eaa49fe55bfae1e0e0065c6e99affc001b
commit e77d42eaa49fe55bfae1e0e0065c6e99affc001b Author: Matthew Ahrens <mahrens@delphix.com> Date: 2014-10-08T15:32:30.000Z 3693 restore_object uses at least two transactions to restore an object Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Andriy Gapon <andriy.gapon@hybridcluster.com> Approved by: Robert Mustacchi <rm@joyent.com>