Bug #5422


preserve AVL invariants in dn_dbufs

Added by Alex Reece about 9 years ago. Updated about 9 years ago.

zfs - Zettabyte File System
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:
External Bug:


Each dnode has an AVL tree of dbufs. The comparator is dbuf_compare [1] and it compares:
  • level
  • blkid
  • state
  • address

The comparison happens in this order.

Now, suppose that we have a dnode that has a dbuf. Let's say that it is for level=0, blkid=0. It just so happens that its state is DB_EVICTING. For completeness sake let's say that it's address is 0x1234.

Now, someone calls dbuf_hold_impl [2] trying to get level=0, blkid=0. It calls dbuf_find which returns NULL since all all the dbufs for that dnode are DB_EVICTING so we take the branch on line 1917. We end up calling dbuf_create [3] which locks the dnode sets up a new dbuf with level=0, blkid=0, and state=DB_EVICTING [4]. It then inserts the dbuf into the hash table as well as the dnode's AVL tree. Then, it sets the state to DB_UNCACHED [5].

This is the bug.

Let's say that the new dbuf has address of 0x5678. When dbuf_create calls avl_add, it's adding a node <0,0,DB_EVICTING,0x5678> which compares as higher than the one existing node <0,0,DB_EVICTING,0x1234>. This is fine. Then, when dbuf_create changes the state, it changes the key of the new dbuf to <0,0,DB_UNCACHED,0x5678> without ensuring that the AVL tree is still ordered properly. It turns out it isn't given the definition of dbuf_states_t [6]: DB_UNCACHED is 0, DB_EVICTING is 5. In other words, changing the state changes the key from:

<0,0,5,0x5678> which compares as greater than <0,0,5,0x1234>

<0,0,0,0x5678> which compares as smaller than <0,0,5,0x1234>

After the avl_add, the nodes are ordered as [<0,0,5,0x1234>, <0,0,5,0x5678>]. After the state change, the nodes should be ordered as [<0,0,0,0x5678>, <0,0,5,0x1234>], but they are not.

IOW, dbuf_create breaks the AVL tree's invariants necessary for it to remain balanced and functional.

We can change the comparison to be based on the tuple <level, blockid, is_search, address>. level, blockid, and address are the same, and is_search is a boolean indicating that this is a special dbuf only used for searching (e.g. for an avl_find). We need the is_search field to ensure that when we do an avl_find in dbuf_destroy, we will get all the dbufs of the specified level and blockid -- without it, we would have to rely on a known relationship between the address of our search dbuf (probably on the stack) and the other dbufs in the tree (probably in the heap). Fortunately, we can continue to use the existing DB_SEARCH state to determine is_search rather than using additional space in the dbuf.


Actions #1

Updated by Electric Monk about 9 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit a846f19d279fdfb0e0d63f78ccaf0205a88274d2

commit  a846f19d279fdfb0e0d63f78ccaf0205a88274d2
Author: Alex Reece <>
Date:   2014-12-18T04:57:34.000Z

    5422 preserve AVL invariants in dn_dbufs
    Reviewed by: Matthew Ahrens <>
    Reviewed by: Paul Dagnelie <>
    Reviewed by: Josef 'Jeff' Sipek <>
    Reviewed by: Albert Lee <>
    Approved by: Dan McDonald <>


Also available in: Atom PDF