Bug #5319
closedsmb_oplock_acquire thread deadlock
100%
Description
Bayard Bell added a comment - 05/Aug/13 6:18 PM - edited
This is my attempt to restate the problem, building on JB's extensive analysis. A fix analysis will follow this synthesis of root cause analysis.
Fundamentally, looking at smb_oplock_acquire(), it is clear that smb_range_check() is called with ol->ol_mutex held (usr/src/uts/common/fs/smbsrv/smb_oplock.c line 233) and node->vp->vnbllock held as a writer (usr/src/uts/common/fs/smbsrv/smb_oplock.c line 236, usr/src/uts/common/nblock.c line 48).
A large number of threads (1003) are blocking on ol->ol_mutex from smb_oplock_break(), while a further batch of thirteen threads are blocked on node->vp->vnbllock, ten from rfs3_read(), the remainder of which are:
> ffffff22dd4beb40::whatthread | ::stacks -C smb_oplock_break | ::stacks -cnbl_start_crit | ::stacks -C rfs3_read | ::findstack -v stack pointer for thread ffffff00f7368c60: ffffff00f7368580 [ ffffff00f7368580 _resume_from_idle+0xf1() ] ffffff00f73685b0 swtch+0x145() ffffff00f7368660 turnstile_block+0x760(0, 0, ffffff22dd4beba0, fffffffffbc08ef0, 0, 0) ffffff00f73686d0 rw_enter_sleep+0x1a3(ffffff22dd4beba0, 0) ffffff00f73686f0 nbl_start_crit+0x19(ffffff22dd4beb40, 0) ffffff00f7368790 fs_frlock+0x229(ffffff22dd4beb40, 2a, ffffff00f7368960, 10002, 0, ffffff00f73688e0, ffffff35974803a0, ffffffffc03b6fb0) ffffff00f7368830 zfs_frlock+0xd9(ffffff22dd4beb40, 2a, ffffff00f7368960, 10002, 0, ffffff00f73688e0, ffffff35974803a0, ffffffffc03b6fb0) ffffff00f73688c0 fop_frlock+0x8c(ffffff22dd4beb40, 2a, ffffff00f7368960, 10002, 0, ffffff00f73688e0, ffffff35974803a0, ffffffffc03b6fb0) ffffff00f7368950 smb_vop_frlock+0x7d(ffffff22dd4beb40, ffffff35974803a0, 10002, ffffff00f7368960) ffffff00f73689e0 smb_fsop_frlock+0xdf(ffffff30583b0368, ffffff22683b7800, 0, ffffff35974803a0) ffffff00f7368a70 smb_lock_range+0x1b2(ffffff225550a348, 7fffffa7, 14, 0, 65) ffffff00f7368ae0 smb_com_locking_andx+0x318(ffffff225550a348) ffffff00f7368b80 smb_dispatch_request+0x4a6(ffffff225550a348) ffffff00f7368bb0 smb_session_worker+0x6c(ffffff225550a348) ffffff00f7368c40 taskq_d_thread+0xb1(ffffff369d311768) ffffff00f7368c50 thread_start+8() stack pointer for thread ffffff2236856200: ffffff00f8b1f8c0 [ ffffff00f8b1f8c0 _resume_from_idle+0xf1() ] ffffff00f8b1f8f0 swtch+0x145() ffffff00f8b1f9a0 turnstile_block+0x760(ffffff225b25fa50, 0, ffffff22dd4beba0, fffffffffbc08ef0, 0, 0) ffffff00f8b1fa10 rw_enter_sleep+0x1a3(ffffff22dd4beba0, 0) ffffff00f8b1fa30 nbl_start_crit+0x19(ffffff22dd4beb40, 0) ffffff00f8b1fa90 fs_shrlock+0xbe(ffffff22dd4beb40, 2b, ffffff00f8b1fc20, 2001, ffffff22c2e43b90, 0) ffffff00f8b1fb00 fop_shrlock+0x74(ffffff22dd4beb40, 2b, ffffff00f8b1fc20, 2001, ffffff22c2e43b90, 0) ffffff00f8b1fca0 vn_openat+0x590(8047ae7, 0, 2001, 0, ffffff00f8b1fce8, 0, 12, 0, 3) ffffff00f8b1fe00 copen+0x435(ffd19553, 8047ae7, 2001, 0) ffffff00f8b1fe30 open64+0x34(8047ae7, 0, 0) ffffff00f8b1fec0 dtrace_systrace_syscall32+0x11a(8047ae7, 0, 0, 1c3, 1, 0) ffffff00f8b1ff10 _sys_sysenter_post_swapgs+0x149() stack pointer for thread ffffff21ed015e00: ffffff00f61bdaf0 [ ffffff00f61bdaf0 _resume_from_idle+0xf1() ] ffffff00f61bdb20 swtch+0x145() ffffff00f61bdbd0 turnstile_block+0x760(ffffff225b25fa50, 1, ffffff22dd4beba0, fffffffffbc08ef0, 0, 0) ffffff00f61bdc40 rw_enter_sleep+0x205(ffffff22dd4beba0, 1) ffffff00f61bdc60 nbl_start_crit+0x19(ffffff22dd4beb40, 1) ffffff00f61bdde0 vn_renameat+0x292(0, 8047ae3, 0, 8047aec, 0) ffffff00f61bde10 vn_rename+0x2f(8047ae3, 8047aec, 0) ffffff00f61bde30 rename+0x17(8047ae3, 8047aec) ffffff00f61bdec0 dtrace_systrace_syscall32+0x11a(8047ae3, 8047aec, ffffffff, 1c3, 1, 0) ffffff00f61bdf10 _sys_sysenter_post_swapgs+0x149()
Updated by Gordon Ross over 8 years ago
Bayard Bell added a comment - 05/Aug/13 8:21 PM
Have had a quick look at vmdump.5. Jason has given us a useful means of understanding the pathology here by starting with "the smb_oplock_acquire() thread", so let's follow him on that, as we agree that smb_oplock_acquire() has the interesting characteristic of acquiring the oplock mutex and exclusive control of the vnode's mandatory lock in smb_oplock_acquire():
stack pointer for thread ffffff00f6ee4c60: ffffff00f6ee4610 [ ffffff00f6ee4610 _resume_from_idle+0xf1() ] ffffff00f6ee4640 swtch+0x145() ffffff00f6ee46f0 turnstile_block+0x760(0, 1, ffffff2246643d78, fffffffffbc08ef0, 0, 0) ffffff00f6ee4760 rw_enter_sleep+0x205(ffffff2246643d78, 1) ffffff00f6ee47d0 smb_lock_range_access+0x4f(ffffff2256ae96c0, ffffff2246643cc8, 0, ffffffffffffffff, 1) ffffff00f6ee4840 smb_range_check+0x5c(ffffff2256ae96c0, ffffff2246643cc8, 0, ffffffffffffffff, 1) ffffff00f6ee48a0 smb_oplock_acquire+0xfe(ffffff2256ae96c0, ffffff2246643cc8, fffffff67abf5978) ffffff00f6ee49c0 smb_open_subr+0xc89(ffffff2256ae96c0) ffffff00f6ee4a00 smb_common_open+0x78(ffffff2256ae96c0) ffffff00f6ee4ae0 smb_com_nt_create_andx+0x10f(ffffff2256ae96c0) ffffff00f6ee4b80 smb_dispatch_request+0x4a6(ffffff2256ae96c0) ffffff00f6ee4bb0 smb_session_worker+0x6c(ffffff2256ae96c0) ffffff00f6ee4c40 taskq_d_thread+0xb1(fffffff6f154d548) ffffff00f6ee4c50 thread_start+8()
Lo and behold, we are deadlocked with smb_lock_range() called from smb_com_locking_andx(), which has the same vnode and smb_node_t in its stack trace as we do, where the rwlock on which we're sleeping is again a non-exclusive hold on the vnode mandatory lock:
> ffffff2246643cc8::print -atx smb_node_t n_oplock.ol_mutex | ::mutex ADDR TYPE HELD MINSPL OLDSPL WAITERS ffffff2246643e70 adapt ffffff00f6ee4c60 - - yes > ffffff2246643cc8::print smb_node_t n_lock_list vp n_lock_list = { n_lock_list.ll_lock = { _opaque = [ 0xffffff00f72a1c65 ] } n_lock_list.ll_list = { list_size = 0xc0 list_offset = 0x10 list_head = { list_next = 0xffffff24b82e0c50 list_prev = 0xffffff24b82e0c50 } } n_lock_list.ll_count = 0x1 n_lock_list.ll_wrop = 0xa3 n_lock_list.ll_mutex = { _opaque = [ 0 ] } n_lock_list.ll_deleteq = { list_size = 0x28 list_offset = 0x8 list_head = { list_next = 0xffffff2246643dc8 list_prev = 0xffffff2246643dc8 } } n_lock_list.ll_deleteq_count = 0 n_lock_list.ll_flushing = 0 (0) } vp = 0xffffff355177e640 > ffffff355177e640::print vnode_t v_nbllock | ::rwlock ADDR OWNER/COUNT FLAGS WAITERS ffffff355177e6a0 READERS=1 B011 ffffff00f72a1c60 (W) || WRITE_WANTED -------+| HAS_WAITERS --------+ > ffffff2246643cc8::print smb_node_t n_lock_list.ll_lock | ::rwlock ADDR OWNER/COUNT FLAGS WAITERS ffffff2246643d78 ffffff00f72a1c60 B101 ffffff00f6a86c60 (R) | | ffffff00f7085c60 (R) WRITE_LOCKED ------+ | ffffff00f6ba2c60 (R) HAS_WAITERS --------+ ffffff00f6f53c60 (R) ffffff00f6ccec60 (R) ffffff00f605cc60 (R) ffffff00f61cec60 (R) ffffff00f4997c60 (R) ffffff00f61a6c60 (R) ffffff00f62b3c60 (R) ffffff00f6ee4c60 (R) > ffffff00f72a1c60::findstack -v stack pointer for thread ffffff00f72a1c60: ffffff00f72a1580 [ ffffff00f72a1580 _resume_from_idle+0xf1() ] ffffff00f72a15b0 swtch+0x145() ffffff00f72a1660 turnstile_block+0x760(0, 0, ffffff355177e6a0, fffffffffbc08ef0, 0, 0) ffffff00f72a16d0 rw_enter_sleep+0x1a3(ffffff355177e6a0, 0) ffffff00f72a16f0 nbl_start_crit+0x19(ffffff355177e640, 0) ffffff00f72a1790 fs_frlock+0x229(ffffff355177e640, 2a, ffffff00f72a1960, 10002, 0, ffffff00f72a18e0, ffffffdd8032bc20, ffffffffc0101fb0) ffffff00f72a1830 zfs_frlock+0xd9(ffffff355177e640, 2a, ffffff00f72a1960, 10002, 0, ffffff00f72a18e0, ffffffdd8032bc20, ffffffffc0101fb0) ffffff00f72a18c0 fop_frlock+0x8c(ffffff355177e640, 2a, ffffff00f72a1960, 10002, 0, ffffff00f72a18e0, ffffffdd8032bc20, ffffffffc0101fb0) ffffff00f72a1950 smb_vop_frlock+0x7d(ffffff355177e640, ffffffdd8032bc20, 10002, ffffff00f72a1960) ffffff00f72a19e0 smb_fsop_frlock+0xdf(ffffff2246643cc8, ffffffcadad2dc00, 0, ffffffdd8032bc20) ffffff00f72a1a70 smb_lock_range+0x1b2(ffffff2256e75ce0, 7fffffa7, 14, 0, 65) ffffff00f72a1ae0 smb_com_locking_andx+0x318(ffffff2256e75ce0) ffffff00f72a1b80 smb_dispatch_request+0x4a6(ffffff2256e75ce0) ffffff00f72a1bb0 smb_session_worker+0x6c(ffffff2256e75ce0) ffffff00f72a1c40 taskq_d_thread+0xb1(fffffffe54eef460) ffffff00f72a1c50 thread_start+8()
Meanwhile, we again have a load of people trying to cash out before the zombie apocalypse by calling smb_oplock_break(), as well as some fresh braaaaains arriving who can't even make it off the dock here on zombie island:
> ffffff21ec9d5368::turnstile ADDR SOBJ WTRS EPRI ITOR PRIOINV ffffff21ec9d5368 ffffff2246643e70 1073 99 0 0 > ::stacks -c mutex_vector_enter | ::stacks -c smb_oplock_break | ::findstack -v ! grep ^stack | wc -l 1072 > ffffff2246643e70::whatthread | ::stacks -c mutex_vector_enter | ::stacks -C smb_oplock_break | ::findstack -v stack pointer for thread ffffff00f72ddc60: ffffff00f72dd9d0 [ ffffff00f72dd9d0 _resume_from_idle+0xf1() ] ffffff00f72dda00 swtch+0x145() ffffff00f72ddab0 turnstile_block+0x760(0, 0, ffffff2246643e70, fffffffffbc07d90, 0, 0) ffffff00f72ddb10 mutex_vector_enter+0x261(ffffff2246643e70) ffffff00f72ddb40 smb_oplock_broadcast+0x26(ffffff2246643cc8) ffffff00f72ddb60 smb_ofile_request_complete+0x24(ffffff353da30bf0) ffffff00f72ddb80 smb_request_free+0x23(ffffff253d465668) ffffff00f72ddbb0 smb_session_worker+0x91(ffffff253d465668) ffffff00f72ddc40 taskq_d_thread+0xb1(fffffff65dac5e08) ffffff00f72ddc50 thread_start+8() > ffffff2246643e70::mutex ADDR TYPE HELD MINSPL OLDSPL WAITERS ffffff2246643e70 adapt ffffff00f6ee4c60 - - yes > ffffff00f6ee4c60::findstack -v stack pointer for thread ffffff00f6ee4c60: ffffff00f6ee4610 [ ffffff00f6ee4610 _resume_from_idle+0xf1() ] ffffff00f6ee4640 swtch+0x145() ffffff00f6ee46f0 turnstile_block+0x760(0, 1, ffffff2246643d78, fffffffffbc08ef0, 0, 0) ffffff00f6ee4760 rw_enter_sleep+0x205(ffffff2246643d78, 1) ffffff00f6ee47d0 smb_lock_range_access+0x4f(ffffff2256ae96c0, ffffff2246643cc8, 0, ffffffffffffffff, 1) ffffff00f6ee4840 smb_range_check+0x5c(ffffff2256ae96c0, ffffff2246643cc8, 0, ffffffffffffffff, 1) ffffff00f6ee48a0 smb_oplock_acquire+0xfe(ffffff2256ae96c0, ffffff2246643cc8, fffffff67abf5978) ffffff00f6ee49c0 smb_open_subr+0xc89(ffffff2256ae96c0) ffffff00f6ee4a00 smb_common_open+0x78(ffffff2256ae96c0) ffffff00f6ee4ae0 smb_com_nt_create_andx+0x10f(ffffff2256ae96c0) ffffff00f6ee4b80 smb_dispatch_request+0x4a6(ffffff2256ae96c0) ffffff00f6ee4bb0 smb_session_worker+0x6c(ffffff2256ae96c0) ffffff00f6ee4c40 taskq_d_thread+0xb1(fffffff6f154d548) ffffff00f6ee4c50 thread_start+8() > ffffff355177e640::whatthread | ::stacks -C smb_oplock_break | ::stacks -C nbl_start_crit | ::stacks -C smb_oplock_acquire | ::findstack -v stack pointer for thread ffffff00f605cc60: ffffff00f605c890 [ ffffff00f605c890 _resume_from_idle+0xf1() ] ffffff00f605c8c0 swtch+0x145() ffffff00f605c970 turnstile_block+0x760(ffffff21ec9d96a0, 1, ffffff2246643d78, fffffffffbc08ef0, 0, 0) ffffff00f605c9e0 rw_enter_sleep+0x205(ffffff2246643d78, 1) ffffff00f605ca50 smb_lock_range_access+0x4f(ffffff2464d04958, ffffff2246643cc8, 4fd000, 8000, 0) ffffff00f605cab0 smb_common_read+0xba(ffffff2464d04958, ffffff23f689e000) ffffff00f605cae0 smb_com_read_andx+0x64(ffffff2464d04958) ffffff00f605cb80 smb_dispatch_request+0x4a6(ffffff2464d04958) ffffff00f605cbb0 smb_session_worker+0x6c(ffffff2464d04958) ffffff00f605cc40 taskq_d_thread+0xb1(fffffffe52e84ad0) ffffff00f605cc50 thread_start+8() > ffffff21ec9d96a0::turnstile ADDR SOBJ WTRS EPRI ITOR PRIOINV ffffff21ec9d96a0 ffffff2246643d78 11 99 0 0
This last bit brings up full circle. We have 11 waiters for ol->ol_mutex, all waiting in smb_common_read(), which is a further effect of deadlocking the smb_oplock_acquire() thread:
> ffffff2246643d78::walk blocked | ::stacks -C smb_common_read | ::findstack -v stack pointer for thread ffffff00f6ee4c60: ffffff00f6ee4610 [ ffffff00f6ee4610 _resume_from_idle+0xf1() ] ffffff00f6ee4640 swtch+0x145() ffffff00f6ee46f0 turnstile_block+0x760(0, 1, ffffff2246643d78, fffffffffbc08ef0, 0, 0) ffffff00f6ee4760 rw_enter_sleep+0x205(ffffff2246643d78, 1) ffffff00f6ee47d0 smb_lock_range_access+0x4f(ffffff2256ae96c0, ffffff2246643cc8, 0, ffffffffffffffff, 1) ffffff00f6ee4840 smb_range_check+0x5c(ffffff2256ae96c0, ffffff2246643cc8, 0, ffffffffffffffff, 1) ffffff00f6ee48a0 smb_oplock_acquire+0xfe(ffffff2256ae96c0, ffffff2246643cc8, fffffff67abf5978) ffffff00f6ee49c0 smb_open_subr+0xc89(ffffff2256ae96c0) ffffff00f6ee4a00 smb_common_open+0x78(ffffff2256ae96c0) ffffff00f6ee4ae0 smb_com_nt_create_andx+0x10f(ffffff2256ae96c0) ffffff00f6ee4b80 smb_dispatch_request+0x4a6(ffffff2256ae96c0) ffffff00f6ee4bb0 smb_session_worker+0x6c(ffffff2256ae96c0) ffffff00f6ee4c40 taskq_d_thread+0xb1(fffffff6f154d548) ffffff00f6ee4c50 thread_start+8()
This looks to line up with the smb_node_t's various counters (1086 total references, 13 opens, 1073 opening):
> ffffff2246643cc8::print -atxnode_t n_refcnt n_open_count n_opening_count ffffff2246643cf4 uint32_t n_refcnt = 0x43e ffffff2246643d08 uint32_t n_open_count = 0xd ffffff2246643d0c uint32_t n_opening_count = 0x431
Updated by Gordon Ross over 8 years ago
Gordon Ross added a comment - 29/Aug/13 8:56 PM
Existing lock order: (note - broken!)
*A: smb_oplock_acquire thread:
A1: The oplock mutex:
mutex_enter(&ol->ol_mutex);
A2: The vnode vp->vnbllock (reader)
nbl_start_crit(node->vp, RW_READER)
A3: The lock range list: node->n_lock_list
(via smb_range_check, smb_lock_range_access)
XXX: Block trying to get READER
*B: smb_locking_andx thread:
(smb_lock_range, smb_unlock_range)
B1: The lock range list: node->n_lock_list
B2: The vnode vp->vnbllock (reader)
nbl_start_crit(node->vp, RW_READER)
(via smb_fsop_frlock, smb_vop_frlock)
XXX: Block trying to get READER
Notes:
VOP_FRLOCK also does nbl_start_crit, and at a level
below what we can control. That would appear to
force the order to be node->n_lock_list and then
the FS level vp->vnbllock.
That also seems more sensible order because conceptually
the SMB locks are "built on top of" the FS-level locks.
That's the order smb_locking_andx is using now, but
unfortunately, several things in smbsrv violate that
ordering. Examples: delete, oplock, rename
Updated by Electric Monk over 8 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit bc7c423f80bd757204d3b17cfd74585ae5b30ed8
commit bc7c423f80bd757204d3b17cfd74585ae5b30ed8 Author: Gordon Ross <gwr@nexenta.com> Date: 2014-11-29T20:44:19.000Z 5319 smb_oplock_acquire thread deadlock Reviewed by: Dan McDonald <danmcd@nexenta.com> Reviewed by: Sarah Jelinek <Sarah.Jelinek@nexenta.com> Reviewed by: Jeffry Molanus <jeffry.molanus@nexenta.com> Approved by: Robert Mustacchi <rm@joyent.com>