Project

General

Profile

Actions

Bug #5319

closed

smb_oplock_acquire thread deadlock

Added by Gordon Ross over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
2014-11-13
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:
External Bug:

Description

Bayard Bell added a comment - 05/Aug/13 6:18 PM - edited

This is my attempt to restate the problem, building on JB's extensive analysis. A fix analysis will follow this synthesis of root cause analysis.
Fundamentally, looking at smb_oplock_acquire(), it is clear that smb_range_check() is called with ol->ol_mutex held (usr/src/uts/common/fs/smbsrv/smb_oplock.c line 233) and node->vp->vnbllock held as a writer (usr/src/uts/common/fs/smbsrv/smb_oplock.c line 236, usr/src/uts/common/nblock.c line 48).
A large number of threads (1003) are blocking on ol->ol_mutex from smb_oplock_break(), while a further batch of thirteen threads are blocked on node->vp->vnbllock, ten from rfs3_read(), the remainder of which are:

> ffffff22dd4beb40::whatthread | ::stacks -C smb_oplock_break | ::stacks -cnbl_start_crit | ::stacks -C rfs3_read | ::findstack -v
stack pointer for thread ffffff00f7368c60: ffffff00f7368580
[ ffffff00f7368580 _resume_from_idle+0xf1() ]
  ffffff00f73685b0 swtch+0x145()
  ffffff00f7368660 turnstile_block+0x760(0, 0, ffffff22dd4beba0, fffffffffbc08ef0, 0, 0)
  ffffff00f73686d0 rw_enter_sleep+0x1a3(ffffff22dd4beba0, 0)
  ffffff00f73686f0 nbl_start_crit+0x19(ffffff22dd4beb40, 0)
  ffffff00f7368790 fs_frlock+0x229(ffffff22dd4beb40, 2a, ffffff00f7368960, 10002, 0, ffffff00f73688e0, ffffff35974803a0, ffffffffc03b6fb0)
  ffffff00f7368830 zfs_frlock+0xd9(ffffff22dd4beb40, 2a, ffffff00f7368960, 10002, 0, ffffff00f73688e0, ffffff35974803a0, ffffffffc03b6fb0)
  ffffff00f73688c0 fop_frlock+0x8c(ffffff22dd4beb40, 2a, ffffff00f7368960, 10002, 0, ffffff00f73688e0, ffffff35974803a0, ffffffffc03b6fb0)
  ffffff00f7368950 smb_vop_frlock+0x7d(ffffff22dd4beb40, ffffff35974803a0, 10002, ffffff00f7368960)
  ffffff00f73689e0 smb_fsop_frlock+0xdf(ffffff30583b0368, ffffff22683b7800, 0, ffffff35974803a0)
  ffffff00f7368a70 smb_lock_range+0x1b2(ffffff225550a348, 7fffffa7, 14, 0, 65)
  ffffff00f7368ae0 smb_com_locking_andx+0x318(ffffff225550a348)
  ffffff00f7368b80 smb_dispatch_request+0x4a6(ffffff225550a348)
  ffffff00f7368bb0 smb_session_worker+0x6c(ffffff225550a348)
  ffffff00f7368c40 taskq_d_thread+0xb1(ffffff369d311768)
  ffffff00f7368c50 thread_start+8()
stack pointer for thread ffffff2236856200: ffffff00f8b1f8c0
[ ffffff00f8b1f8c0 _resume_from_idle+0xf1() ]
  ffffff00f8b1f8f0 swtch+0x145()
  ffffff00f8b1f9a0 turnstile_block+0x760(ffffff225b25fa50, 0, ffffff22dd4beba0,  fffffffffbc08ef0, 0, 0)
  ffffff00f8b1fa10 rw_enter_sleep+0x1a3(ffffff22dd4beba0, 0)
  ffffff00f8b1fa30 nbl_start_crit+0x19(ffffff22dd4beb40, 0)
  ffffff00f8b1fa90 fs_shrlock+0xbe(ffffff22dd4beb40, 2b, ffffff00f8b1fc20, 2001, ffffff22c2e43b90, 0)
  ffffff00f8b1fb00 fop_shrlock+0x74(ffffff22dd4beb40, 2b, ffffff00f8b1fc20, 2001, ffffff22c2e43b90, 0)
  ffffff00f8b1fca0 vn_openat+0x590(8047ae7, 0, 2001, 0, ffffff00f8b1fce8, 0, 12, 0, 3)
  ffffff00f8b1fe00 copen+0x435(ffd19553, 8047ae7, 2001, 0)
  ffffff00f8b1fe30 open64+0x34(8047ae7, 0, 0)
  ffffff00f8b1fec0 dtrace_systrace_syscall32+0x11a(8047ae7, 0, 0, 1c3, 1, 0)
  ffffff00f8b1ff10 _sys_sysenter_post_swapgs+0x149()
stack pointer for thread ffffff21ed015e00: ffffff00f61bdaf0
[ ffffff00f61bdaf0 _resume_from_idle+0xf1() ]
  ffffff00f61bdb20 swtch+0x145()
  ffffff00f61bdbd0 turnstile_block+0x760(ffffff225b25fa50, 1, ffffff22dd4beba0,  fffffffffbc08ef0, 0, 0)
  ffffff00f61bdc40 rw_enter_sleep+0x205(ffffff22dd4beba0, 1)
  ffffff00f61bdc60 nbl_start_crit+0x19(ffffff22dd4beb40, 1)
  ffffff00f61bdde0 vn_renameat+0x292(0, 8047ae3, 0, 8047aec, 0)
  ffffff00f61bde10 vn_rename+0x2f(8047ae3, 8047aec, 0)
  ffffff00f61bde30 rename+0x17(8047ae3, 8047aec)
  ffffff00f61bdec0 dtrace_systrace_syscall32+0x11a(8047ae3, 8047aec, ffffffff, 1c3, 1, 0)
  ffffff00f61bdf10 _sys_sysenter_post_swapgs+0x149()

Actions #1

Updated by Gordon Ross over 8 years ago

Bayard Bell added a comment - 05/Aug/13 8:21 PM

Have had a quick look at vmdump.5. Jason has given us a useful means of understanding the pathology here by starting with "the smb_oplock_acquire() thread", so let's follow him on that, as we agree that smb_oplock_acquire() has the interesting characteristic of acquiring the oplock mutex and exclusive control of the vnode's mandatory lock in smb_oplock_acquire():

stack pointer for thread ffffff00f6ee4c60: ffffff00f6ee4610
[ ffffff00f6ee4610 _resume_from_idle+0xf1() ]
  ffffff00f6ee4640 swtch+0x145()
  ffffff00f6ee46f0 turnstile_block+0x760(0, 1, ffffff2246643d78, fffffffffbc08ef0, 0, 0)
  ffffff00f6ee4760 rw_enter_sleep+0x205(ffffff2246643d78, 1)
  ffffff00f6ee47d0 smb_lock_range_access+0x4f(ffffff2256ae96c0, ffffff2246643cc8, 0, ffffffffffffffff, 1)
  ffffff00f6ee4840 smb_range_check+0x5c(ffffff2256ae96c0, ffffff2246643cc8, 0, ffffffffffffffff, 1)
  ffffff00f6ee48a0 smb_oplock_acquire+0xfe(ffffff2256ae96c0, ffffff2246643cc8, fffffff67abf5978)
  ffffff00f6ee49c0 smb_open_subr+0xc89(ffffff2256ae96c0)
  ffffff00f6ee4a00 smb_common_open+0x78(ffffff2256ae96c0)
  ffffff00f6ee4ae0 smb_com_nt_create_andx+0x10f(ffffff2256ae96c0)
  ffffff00f6ee4b80 smb_dispatch_request+0x4a6(ffffff2256ae96c0)
  ffffff00f6ee4bb0 smb_session_worker+0x6c(ffffff2256ae96c0)
  ffffff00f6ee4c40 taskq_d_thread+0xb1(fffffff6f154d548)
  ffffff00f6ee4c50 thread_start+8()

Lo and behold, we are deadlocked with smb_lock_range() called from smb_com_locking_andx(), which has the same vnode and smb_node_t in its stack trace as we do, where the rwlock on which we're sleeping is again a non-exclusive hold on the vnode mandatory lock:
> ffffff2246643cc8::print -atx smb_node_t n_oplock.ol_mutex | ::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
ffffff2246643e70 adapt ffffff00f6ee4c60      -      -     yes
> ffffff2246643cc8::print smb_node_t n_lock_list vp
n_lock_list = {
    n_lock_list.ll_lock = {
        _opaque = [ 0xffffff00f72a1c65 ]
    }
    n_lock_list.ll_list = {
        list_size = 0xc0
        list_offset = 0x10
        list_head = {
            list_next = 0xffffff24b82e0c50
            list_prev = 0xffffff24b82e0c50
        }
    }
    n_lock_list.ll_count = 0x1
    n_lock_list.ll_wrop = 0xa3
    n_lock_list.ll_mutex = {
        _opaque = [ 0 ]
    }
    n_lock_list.ll_deleteq = {
        list_size = 0x28
        list_offset = 0x8
        list_head = {
            list_next = 0xffffff2246643dc8
            list_prev = 0xffffff2246643dc8
        }
    }
    n_lock_list.ll_deleteq_count = 0
    n_lock_list.ll_flushing = 0 (0)
}
vp = 0xffffff355177e640
> ffffff355177e640::print vnode_t v_nbllock | ::rwlock
            ADDR      OWNER/COUNT FLAGS          WAITERS
ffffff355177e6a0        READERS=1  B011 ffffff00f72a1c60 (W)
                                     ||
                 WRITE_WANTED -------+|
                  HAS_WAITERS --------+

> ffffff2246643cc8::print smb_node_t n_lock_list.ll_lock | ::rwlock
            ADDR      OWNER/COUNT FLAGS          WAITERS
ffffff2246643d78 ffffff00f72a1c60  B101 ffffff00f6a86c60 (R)
                                    | | ffffff00f7085c60 (R)
                 WRITE_LOCKED ------+ | ffffff00f6ba2c60 (R)
                  HAS_WAITERS --------+ ffffff00f6f53c60 (R)
                                        ffffff00f6ccec60 (R)
                                        ffffff00f605cc60 (R)
                                        ffffff00f61cec60 (R)
                                        ffffff00f4997c60 (R)
                                        ffffff00f61a6c60 (R)
                                        ffffff00f62b3c60 (R)
                                        ffffff00f6ee4c60 (R)
> ffffff00f72a1c60::findstack -v
stack pointer for thread ffffff00f72a1c60: ffffff00f72a1580
[ ffffff00f72a1580 _resume_from_idle+0xf1() ]
  ffffff00f72a15b0 swtch+0x145()
  ffffff00f72a1660 turnstile_block+0x760(0, 0, ffffff355177e6a0, fffffffffbc08ef0, 0, 0)
  ffffff00f72a16d0 rw_enter_sleep+0x1a3(ffffff355177e6a0, 0)
  ffffff00f72a16f0 nbl_start_crit+0x19(ffffff355177e640, 0)
  ffffff00f72a1790 fs_frlock+0x229(ffffff355177e640, 2a, ffffff00f72a1960, 10002, 0, ffffff00f72a18e0, ffffffdd8032bc20, ffffffffc0101fb0)
  ffffff00f72a1830 zfs_frlock+0xd9(ffffff355177e640, 2a, ffffff00f72a1960, 10002, 0, ffffff00f72a18e0, ffffffdd8032bc20, ffffffffc0101fb0)
  ffffff00f72a18c0 fop_frlock+0x8c(ffffff355177e640, 2a, ffffff00f72a1960, 10002, 0, ffffff00f72a18e0, ffffffdd8032bc20, ffffffffc0101fb0)
  ffffff00f72a1950 smb_vop_frlock+0x7d(ffffff355177e640, ffffffdd8032bc20, 10002, ffffff00f72a1960)
  ffffff00f72a19e0 smb_fsop_frlock+0xdf(ffffff2246643cc8, ffffffcadad2dc00, 0, ffffffdd8032bc20)
  ffffff00f72a1a70 smb_lock_range+0x1b2(ffffff2256e75ce0, 7fffffa7, 14, 0, 65)
  ffffff00f72a1ae0 smb_com_locking_andx+0x318(ffffff2256e75ce0)
  ffffff00f72a1b80 smb_dispatch_request+0x4a6(ffffff2256e75ce0)
  ffffff00f72a1bb0 smb_session_worker+0x6c(ffffff2256e75ce0)
  ffffff00f72a1c40 taskq_d_thread+0xb1(fffffffe54eef460)
  ffffff00f72a1c50 thread_start+8()

Meanwhile, we again have a load of people trying to cash out before the zombie apocalypse by calling smb_oplock_break(), as well as some fresh braaaaains arriving who can't even make it off the dock here on zombie island:
> ffffff21ec9d5368::turnstile
            ADDR             SOBJ  WTRS EPRI             ITOR          PRIOINV
ffffff21ec9d5368 ffffff2246643e70  1073   99                0                0
> ::stacks -c mutex_vector_enter | ::stacks -c smb_oplock_break | ::findstack -v ! grep ^stack | wc -l
1072
> ffffff2246643e70::whatthread | ::stacks -c mutex_vector_enter | ::stacks -C smb_oplock_break | ::findstack -v
stack pointer for thread ffffff00f72ddc60: ffffff00f72dd9d0
[ ffffff00f72dd9d0 _resume_from_idle+0xf1() ]
  ffffff00f72dda00 swtch+0x145()
  ffffff00f72ddab0 turnstile_block+0x760(0, 0, ffffff2246643e70, fffffffffbc07d90, 0, 0)
  ffffff00f72ddb10 mutex_vector_enter+0x261(ffffff2246643e70)
  ffffff00f72ddb40 smb_oplock_broadcast+0x26(ffffff2246643cc8)
  ffffff00f72ddb60 smb_ofile_request_complete+0x24(ffffff353da30bf0)
  ffffff00f72ddb80 smb_request_free+0x23(ffffff253d465668)
  ffffff00f72ddbb0 smb_session_worker+0x91(ffffff253d465668)
  ffffff00f72ddc40 taskq_d_thread+0xb1(fffffff65dac5e08)
  ffffff00f72ddc50 thread_start+8()
> ffffff2246643e70::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
ffffff2246643e70 adapt ffffff00f6ee4c60      -      -     yes
> ffffff00f6ee4c60::findstack -v
stack pointer for thread ffffff00f6ee4c60: ffffff00f6ee4610
[ ffffff00f6ee4610 _resume_from_idle+0xf1() ]
  ffffff00f6ee4640 swtch+0x145()
  ffffff00f6ee46f0 turnstile_block+0x760(0, 1, ffffff2246643d78, fffffffffbc08ef0, 0, 0)
  ffffff00f6ee4760 rw_enter_sleep+0x205(ffffff2246643d78, 1)
  ffffff00f6ee47d0 smb_lock_range_access+0x4f(ffffff2256ae96c0, ffffff2246643cc8, 0, ffffffffffffffff, 1)
  ffffff00f6ee4840 smb_range_check+0x5c(ffffff2256ae96c0, ffffff2246643cc8, 0, ffffffffffffffff, 1)
  ffffff00f6ee48a0 smb_oplock_acquire+0xfe(ffffff2256ae96c0, ffffff2246643cc8, fffffff67abf5978)
  ffffff00f6ee49c0 smb_open_subr+0xc89(ffffff2256ae96c0)
  ffffff00f6ee4a00 smb_common_open+0x78(ffffff2256ae96c0)
  ffffff00f6ee4ae0 smb_com_nt_create_andx+0x10f(ffffff2256ae96c0)
  ffffff00f6ee4b80 smb_dispatch_request+0x4a6(ffffff2256ae96c0)
  ffffff00f6ee4bb0 smb_session_worker+0x6c(ffffff2256ae96c0)
  ffffff00f6ee4c40 taskq_d_thread+0xb1(fffffff6f154d548)
  ffffff00f6ee4c50 thread_start+8()
> ffffff355177e640::whatthread | ::stacks -C smb_oplock_break | ::stacks -C nbl_start_crit | ::stacks -C smb_oplock_acquire | ::findstack -v
stack pointer for thread ffffff00f605cc60: ffffff00f605c890
[ ffffff00f605c890 _resume_from_idle+0xf1() ]
  ffffff00f605c8c0 swtch+0x145()
  ffffff00f605c970 turnstile_block+0x760(ffffff21ec9d96a0, 1, ffffff2246643d78, fffffffffbc08ef0, 0, 0)
  ffffff00f605c9e0 rw_enter_sleep+0x205(ffffff2246643d78, 1)
  ffffff00f605ca50 smb_lock_range_access+0x4f(ffffff2464d04958, ffffff2246643cc8, 4fd000, 8000, 0)
  ffffff00f605cab0 smb_common_read+0xba(ffffff2464d04958, ffffff23f689e000)
  ffffff00f605cae0 smb_com_read_andx+0x64(ffffff2464d04958)
  ffffff00f605cb80 smb_dispatch_request+0x4a6(ffffff2464d04958)
  ffffff00f605cbb0 smb_session_worker+0x6c(ffffff2464d04958)
  ffffff00f605cc40 taskq_d_thread+0xb1(fffffffe52e84ad0)
  ffffff00f605cc50 thread_start+8()
> ffffff21ec9d96a0::turnstile
            ADDR             SOBJ  WTRS EPRI             ITOR          PRIOINV
ffffff21ec9d96a0 ffffff2246643d78    11   99                0                0

This last bit brings up full circle. We have 11 waiters for ol->ol_mutex, all waiting in smb_common_read(), which is a further effect of deadlocking the smb_oplock_acquire() thread:
> ffffff2246643d78::walk blocked | ::stacks -C smb_common_read | ::findstack -v
stack pointer for thread ffffff00f6ee4c60: ffffff00f6ee4610
[ ffffff00f6ee4610 _resume_from_idle+0xf1() ]
  ffffff00f6ee4640 swtch+0x145()
  ffffff00f6ee46f0 turnstile_block+0x760(0, 1, ffffff2246643d78, fffffffffbc08ef0, 0, 0)
  ffffff00f6ee4760 rw_enter_sleep+0x205(ffffff2246643d78, 1)
  ffffff00f6ee47d0 smb_lock_range_access+0x4f(ffffff2256ae96c0, ffffff2246643cc8, 0, ffffffffffffffff, 1)
  ffffff00f6ee4840 smb_range_check+0x5c(ffffff2256ae96c0, ffffff2246643cc8, 0, ffffffffffffffff, 1)
  ffffff00f6ee48a0 smb_oplock_acquire+0xfe(ffffff2256ae96c0, ffffff2246643cc8, fffffff67abf5978)
  ffffff00f6ee49c0 smb_open_subr+0xc89(ffffff2256ae96c0)
  ffffff00f6ee4a00 smb_common_open+0x78(ffffff2256ae96c0)
  ffffff00f6ee4ae0 smb_com_nt_create_andx+0x10f(ffffff2256ae96c0)
  ffffff00f6ee4b80 smb_dispatch_request+0x4a6(ffffff2256ae96c0)
  ffffff00f6ee4bb0 smb_session_worker+0x6c(ffffff2256ae96c0)
  ffffff00f6ee4c40 taskq_d_thread+0xb1(fffffff6f154d548)
  ffffff00f6ee4c50 thread_start+8()

This looks to line up with the smb_node_t's various counters (1086 total references, 13 opens, 1073 opening):
> ffffff2246643cc8::print -atxnode_t n_refcnt n_open_count n_opening_count
ffffff2246643cf4 uint32_t n_refcnt = 0x43e
ffffff2246643d08 uint32_t n_open_count = 0xd
ffffff2246643d0c uint32_t n_opening_count = 0x431

Actions #2

Updated by Gordon Ross over 8 years ago

Gordon Ross added a comment - 29/Aug/13 8:56 PM

Existing lock order: (note - broken!)
*A: smb_oplock_acquire thread:
A1: The oplock mutex:
mutex_enter(&ol->ol_mutex);
A2: The vnode vp->vnbllock (reader)
nbl_start_crit(node->vp, RW_READER)
A3: The lock range list: node->n_lock_list
(via smb_range_check, smb_lock_range_access)
XXX: Block trying to get READER
*B: smb_locking_andx thread:
(smb_lock_range, smb_unlock_range)
B1: The lock range list: node->n_lock_list
B2: The vnode vp->vnbllock (reader)
nbl_start_crit(node->vp, RW_READER)
(via smb_fsop_frlock, smb_vop_frlock)
XXX: Block trying to get READER
Notes:
VOP_FRLOCK also does nbl_start_crit, and at a level
below what we can control. That would appear to
force the order to be node->n_lock_list and then
the FS level vp->vnbllock.
That also seems more sensible order because conceptually
the SMB locks are "built on top of" the FS-level locks.
That's the order smb_locking_andx is using now, but
unfortunately, several things in smbsrv violate that
ordering. Examples: delete, oplock, rename

Actions #3

Updated by Electric Monk over 8 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit bc7c423f80bd757204d3b17cfd74585ae5b30ed8

commit  bc7c423f80bd757204d3b17cfd74585ae5b30ed8
Author: Gordon Ross <gwr@nexenta.com>
Date:   2014-11-29T20:44:19.000Z

    5319 smb_oplock_acquire thread deadlock
    Reviewed by: Dan McDonald <danmcd@nexenta.com>
    Reviewed by: Sarah Jelinek <Sarah.Jelinek@nexenta.com>
    Reviewed by: Jeffry Molanus <jeffry.molanus@nexenta.com>
    Approved by: Robert Mustacchi <rm@joyent.com>

Actions

Also available in: Atom PDF