Bug #3695

missing rele in dmu_send_impl

Added by Robert Mustacchi almost 8 years ago. Updated almost 8 years ago.

zfs - Zettabyte File System
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


We have a panic with the following status:

> ::status
debugging crash dump vmcore.0 (64-bit) from headnode
operating system: 5.11 joyent_20130325T170959Z (i86pc)
image uuid: (not set)
panic message: thread ffffff0a0cee6440 terminating with rrw lock ffffff090a09c2b8 held
dump content: kernel pages only
> $C
ffffff003fc08c50 vpanic()
ffffff003fc08c70 0xfffffffff7d854ce()
ffffff003fc08ca0 tsd_exit+0x79()
ffffff003fc08cc0 thread_exit+0x24()
ffffff003fc08d50 proc_exit+0xa62(2, d)
ffffff003fc08d70 exit+0x15(2, d)
ffffff003fc08e00 psig+0x33b()
ffffff003fc08ec0 post_syscall+0x82d(20, 0)
ffffff003fc08f00 syscall_exit+0x68()
ffffff003fc08f10 0xfffffffffb800ed9()

So we have a thread that is exiting on a sigpipe and still has zfs tsd. Unfortunately at this point in time it's hard to know what process it was, but based on our expectations, we think this is a zfs send.

> rrw_tsd_key/D
rrw_tsd_key:    7               
> ffffff0a0cee6440::tsd -k 7 | ::print rrw_node_t
    rn_next = 0
    rn_rrl = 0xffffff090a09c2b8
    rn_tag = __func__.34482
> __func__.34482/s
__func__.34482: dmu_send_obj

Therefore the hold comes from dmu_send_obj. By code inspection, we know that the hold has to correspond to the dsl_pool_hold. While dmu_send_obj looks fine and we can tell by inspection that everything is okay, dmu_send_impl is more problematic. Specifically our problem is the call to dump_bytes. If that fails, we goto out. However, every other call to goto out is after calls to dsl_dataset_long_hold() and dsl_pool_rele(). That means that if we fail the dump_bytes, we'll never release our hold on the pool which causes us to have a dangling hold on the rrw_lock, which blows up when we exit the process.


Updated by Robert Mustacchi almost 8 years ago

A naive and prototype fix is


Updated by Matthew Ahrens almost 8 years ago

  • Status changed from New to Closed

Great analysis on this. It is a duplicate of 3645.

Also available in: Atom PDF