Project

General

Profile

Bug #5056

ZFS deadlock on db_mtx and dn_holds

Added by Simon Klinkert about 5 years ago. Updated 6 months ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
2015-01-13
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Difficulty:
Medium
Tags:
needs-triage

Description

Stack #1:

The following thread is waiting for dn_holds while holding db_mtx. It calls mutex_enter(&db->db_mtx) in dbuf_rele(). It’s a bad idea to sleep with a held mutex.

stack pointer for thread ffffff04f17b9b00: ffffff001eb4c5d0
[ ffffff001eb4c5d0 _resume_from_idle+0xf4() ]
  ffffff001eb4c600 swtch+0x141()
  ffffff001eb4c640 cv_wait+0x70(ffffff04f17b9cee, ffffff04f17b9cf0)
  ffffff001eb4c6a0 delay_common+0xb8(1)
  ffffff001eb4c6e0 delay+0x30(1)
  ffffff001eb4c710 dnode_special_close+0x32(ffffff0622e286e0)
  ffffff001eb4c740 dmu_objset_evict+0xa8(ffffff0622e286c0)
  ffffff001eb4c770 dsl_dataset_evict+0x31(ffffff063891ecd0, ffffff0622345680)
  ffffff001eb4c7a0 dbuf_evict_user+0x4b(ffffff063891ecd0)
  ffffff001eb4c7e0 dbuf_rele_and_unlock+0x12b(ffffff063891ecd0, ffffff05361efd80)
  ffffff001eb4c820 dbuf_rele+0x30(ffffff063891ecd0, ffffff05361efd80)                   
  ffffff001eb4c840 dmu_buf_rele+0x15(ffffff063891ecd0, ffffff05361efd80)
  ffffff001eb4c860 dsl_dataset_rele+0x19(ffffff0622345680, ffffff05361efd80)
  ffffff001eb4c8a0 dsl_dataset_disown+0x61(ffffff0622345680, ffffff05361efd80)
  ffffff001eb4c8c0 dmu_objset_disown+0x18(ffffff0622e286c0, ffffff05361efd80)
  ffffff001eb4c920 zfs_umount+0x16c(ffffff05b91d04f8, 400, ffffff04e61dcdb0)
  ffffff001eb4c950 fsop_unmount+0x1b(ffffff05b91d04f8, 400, ffffff04e61dcdb0)
  ffffff001eb4c9a0 dounmount+0x57(ffffff05b91d04f8, 400, ffffff04e61dcdb0)
  ffffff001eb4c9d0 zfs_unmount_snap+0x63(ffffff001eb4c9f0)
  ffffff001eb4cb70 dsl_dataset_user_release_impl+0xc3(ffffff05cd5a0a20, 0, ffffff04f3e439c0)
  ffffff001eb4cb90 dsl_dataset_user_release_tmp+0x1d(ffffff04f3e439c0, ffffff05cd5a0a20)
  ffffff001eb4cbd0 dsl_dataset_user_release_onexit+0x8b(ffffff056d5ae7c0)
  ffffff001eb4cc10 zfs_onexit_destroy+0x43(ffffff053b5f6b98)
  ffffff001eb4cc40 zfs_ctldev_destroy+0x18(ffffff053b5f6b98, 561)
  ffffff001eb4cca0 zfsdev_close+0x89(5f00000561, 2403, 2, ffffff05f198e848)
  ffffff001eb4ccd0 dev_close+0x31(5f00000561, 2403, 2, ffffff05f198e848)
  ffffff001eb4cd20 device_close+0xd8(ffffff04f3e4c480, 2403, ffffff05f198e848)
  ffffff001eb4cdb0 spec_close+0x17b(ffffff04f3e4c480, 2403, 1, 0, ffffff05f198e848, 0)
  ffffff001eb4ce30 fop_close+0x61(ffffff04f3e4c480, 2403, 1, 0, ffffff05f198e848, 0)
  ffffff001eb4ce70 closef+0x5e(ffffff04f6c43228)
  ffffff001eb4cee0 closeandsetf+0x398(c, 0)
  ffffff001eb4cf00 close+0x13(c)
  ffffff001eb4cf10 sys_syscall+0x17a()
> ffffff05a6011630::refcount
refcount_t at ffffff05a6011630 has 7 holds (untracked)

Stack #2:

stack pointer for thread ffffff04f168c3a0: ffffff00246dd7b0
[ ffffff00246dd7b0 _resume_from_idle+0xf4() ]
  ffffff00246dd7e0 swtch+0x141()
  ffffff00246dd890 turnstile_block+0x262(0, 0, ffffff063891ed28, fffffffffbc07980, 0, 0)
  ffffff00246dd900 mutex_vector_enter+0x3c5(ffffff063891ed28)
  ffffff00246dd970 dbuf_read+0x9f(ffffff063891ecd0, 0, 9)
  ffffff00246dd9d0 dmu_bonus_hold+0xad(ffffff04f39b6980, 77761, fffffffff7a5b1f0, ffffff00246dda48)
  ffffff00246ddaa0 dsl_dataset_hold_obj+0x4b(ffffff04f3e439c0, 77761, fffffffff7a5b1f0, ffffff00246ddab0)
  ffffff00246ddb10 dsl_dataset_hold+0x133(ffffff04f3e439c0, ffffff053871a000, fffffffff7a5b1f0, ffffff00246ddb30)
  ffffff00246ddb80 dmu_objset_hold+0x5b(ffffff053871a000, fffffffff7a5b1f0, ffffff00246ddba8)
  ffffff00246ddbd0 zfs_ioc_objset_stats+0x23(ffffff053871a000)
  ffffff00246ddc80 zfsdev_ioctl+0x4a7(5f00000000, 5a12, fdd7ea70, 100003, ffffff05329c5778, ffffff00246dde68)
  ffffff00246ddcc0 cdev_ioctl+0x39(5f00000000, 5a12, fdd7ea70, 100003, ffffff05329c5778, ffffff00246dde68)
  ffffff00246ddd10 spec_ioctl+0x60(ffffff04f17db500, 5a12, fdd7ea70, 100003, ffffff05329c5778, ffffff00246dde68, 0)
  ffffff00246ddda0 fop_ioctl+0x55(ffffff04f17db500, 5a12, fdd7ea70, 100003, ffffff05329c5778, ffffff00246dde68, 0)
  ffffff00246ddec0 ioctl+0x9b(7, 5a12, fdd7ea70)
  ffffff00246ddf10 _sys_sysenter_post_swapgs+0x149()
> ffffff063891ed28::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
ffffff063891ed28 adapt ffffff04f17b9b00      -      -     yes

In the second stack, dmu_bonus_hold() would call dnode_rele() for removing the dn_holds refcount.


Subtasks

Bug #5531: NULL pointer dereference in dsl_prop_get_ds()ClosedMatthew Ahrens

Actions

Related issues

Related to illumos gate - Bug #6676: Race between unique_insert() and unique_remove() causes ZFS fsid changeClosed2016-02-25

Actions

History

#1

Updated by Justin Gibbs almost 5 years ago

Fix available for review: https://reviews.csiden.org/r/131/

#2

Updated by Electric Monk over 4 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit bc9014e6a81272073b9854d9f65dd59e18d18c35

commit  bc9014e6a81272073b9854d9f65dd59e18d18c35
Author: Justin Gibbs <justing@spectralogic.com>
Date:   2015-01-06T05:13:32.000Z

    5056 ZFS deadlock on db_mtx and dn_holds
    Reviewed by: Will Andrews <willa@spectralogic.com>
    Reviewed by: Matt Ahrens <mahrens@delphix.com>
    Reviewed by: George Wilson <george.wilson@delphix.com>
    Approved by: Dan McDonald <danmcd@omniti.com>

#3

Updated by Matthew Ahrens almost 4 years ago

I don't think that Stack #1 is waiting for Stack #2 in the example above. The dnode that stack #2 is going to release is the dnode of the dataset's object in the MOS. But Stack #1 is waiting for the dnodes in the objset (not the MOS) to be released.

So I don't think the deadlock is caused by the two stacks here. Maybe there is another thread involved?

#4

Updated by Marcel Telka over 3 years ago

  • Related to Bug #6676: Race between unique_insert() and unique_remove() causes ZFS fsid change added
#5

Updated by Andriy Gapon 6 months ago

This is a very old and fixed bug, but I want to note that I agree with Matt in comment 3.
Thread 2 is certainly blocked by thread 1, but the converse is not true. Thread 1 seems to be stuck because some dbuf did not get evicted and now there is nothing to evict it even though it's eligible (it's at ARC's mercy in my opinion).

I am looking at possibly the same issue that's been reproduced quite recently with a quite old ZFS version.
It's interesting that the state where dnode_special_close() is stuck is such that dmu_objset_evict_dbufs() and dnode_evict_dbufs() could have evicted all dbufs. There are no outstanding holds.
The meta dnode has 7 holds. The objset has 3 dnodes, all with zero holds, all backed by the same dbuf. The dbuf has zero holds. It has 6 indirect dbuf-s as ancestors. Each of them has one hold as expected. So, the holds on the meta dnode all come from those L0 .. L6 dbuf-s.
I think that there could be some outstanding hold at the time when dmu_objset_evict_dbufs() was originally called.
The hold was removed after dmu_objset_evict_dbufs() gave up on the L0 dbuf. But since the dbuf is not marked for immediate eviction it was allowed to remain cached. And back then (in terms of ZFS code) we did not have a mechanism to mark a dbuf as pending eviction.
I think that that was a design deficiency in the eviction mechanism.

Anyway, I think that Justin's changes fixed that problem among other improvements.

#6

Updated by Andriy Gapon 6 months ago

However, there is one thing that puzzles me a lot.
How another thread could be holding or getting a hold of a dbuf in the objset while Thread 1 is evicting it...
I think that to get to a dbuf it would need a hold on the dataset first.
If the dataset was held, then it would not be getting evicted.
And no other thread could get a hold of the dataset once the eviction has started because the dataset's bonus dbuf is locked all the time. Thread 2 is an example.

#7

Updated by Andriy Gapon 6 months ago

I think that the problem could be because of a race between the objset eviction and the ARC eviction.
dbuf_destroy() called from the ARC eviction would first remove a dbuf from its dnode's dbuf list and only then a corresponding hold on the dnode (and its backing dbuf in the meta-dnode) would be released. This allows for a race where the objset eviction would first see a dnode with no dbuf-s (and, thus, no work to do) and then it would see the held dbuf in the metadnode that cannot be evicted.

Also available in: Atom PDF