Project

General

Profile

Actions

Bug #13873

closed

SMB shutdown hang with CA handles

Added by Gordon Ross 6 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
cifs - CIFS server and client
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

SMB server shut down hang observed with threads stuck in oplock break:


THREAD           STATE    SOBJ                COUNT
ffffd001c956ec40 SLEEP    CV                      2
                 swtch+0x141
                 cv_wait+0x70
                 smb_ofile_hold_olbrk+0x7c
                 smb_oplock_ind_break+0x80
                 smb_oplock_break_cmn+0x889
                 smb_oplock_break_OPEN+0x31
                 smb_common_open+0xdda
                 smb2_dh_import_handle+0x35e
                 smb2_dh_import_share+0x1a8
                 taskq_d_thread+0xb7
                 thread_start+8
ffffd14cf871e100 SLEEP    CV                      1
                 swtch+0x141
                 cv_wait+0x70
                 smb_avl_destroy+0x5b
                 smb_export_stop+0x4e
                 smb_server_shutdown+0x144
                 smb_server_delete+0x1c7
                 smb_drv_close+0x3d
                 dev_close+0x31
                 device_close+0xd8
                 spec_close+0x17b
                 fop_close+0x61
                 closef+0x5e
                 closeandsetf+0x398
                 close+0x13
                 _sys_sysenter_post_swapgs+0x149
ffffd001c9562c40 SLEEP    CV                      1
                 swtch+0x141
                 cv_wait+0x70
                 smb_thread_continue_timedwait_locked+0x5d
                 smb_thread_continue+0x2d
                 smb_kshare_unexport_thread+0x40
                 smb_thread_entry_point+0x53
                 thread_start+8
> ::stacks -m smbsrv
 ffffd001c956ec40::thread -d
            ADDR         DISPTIME BOUND PR
ffffd001c956ec40           2da6e1    -1  0

Actions #1

Updated by Gordon Ross 6 months ago

Analysis:

SMB server shutdown is waiting for nodes and ofiles to go away, but those are held by the oplock threads which have gotten stuck.


> ffffd001c956ec40::findstack -v
stack pointer for thread ffffd001c956ec40: ffffd001c956e6a0
[ ffffd001c956e6a0 _resume_from_idle+0x112() ]
ffffd001c956e6d0 swtch+0x141()
ffffd001c956e710 cv_wait+0x70(ffffd1779a845860, ffffd1779a845818)
ffffd001c956e750 smb_ofile_hold_olbrk+0x7c(ffffd1779a8457e0)
ffffd001c956e7c0 smb_oplock_ind_break+0x80(ffffd1779a8457e0, 3, 1, 0)
ffffd001c956e860 smb_oplock_break_cmn+0x889(ffffd14da2a4f990, ffffd1779a845428, 100004)
ffffd001c956e890 smb_oplock_break_OPEN+0x31(ffffd14da2a4f990, ffffd1779a845428, 12019f, 1)
ffffd001c956ea10 smb_common_open+0xdda(ffffd14dba23e330)
ffffd001c956eaf0 smb2_dh_import_handle+0x35e(ffffd14dba23e330, ffffd14db5e75980, abb6331871d78487)
ffffd001c956eb90 smb2_dh_import_share+0x1a8(ffffd14dba23e330)
ffffd001c956ec20 taskq_d_thread+0xb7(ffffd14cf8b859b0)
ffffd001c956ec30 thread_start+8()

ffffd14dba23e330 is addr of smb_request_t
ffffd14da2a4f990 is addr of smb_node_t
ffffd1779a845428 is addr of smb_ofile_t (for the file that is triggering open)
ffffd1779a8457e0 is addr of smb_ofile_t (for the file to which oplock break is sent)

> ffffd1779a8457e0::print -at smb_ofile_t f_tree
ffffd1779a845840 smb_tree_t *f_tree = 0xffffd17991d08c30

0xffffd17991d08c30::print -at smb_tree_t
..
ffffd17991d08c50 smb_tree_state_t t_state = 0 (SMB_TREE_STATE_CONNECTED)
ffffd17991d08dd8 char [256] t_sharename = [ "proj1k_ss21" ]
ffffd17991d08ed8 char [1024] t_resource = [ "/export/proj1k/ss21" ]
...

The thread that is doing shutdown seems to be waiting for avl_refcnt to go to 0 (but is it 2)
> ffffd14cf871e100::findstack -v
stack pointer for thread ffffd14cf871e100: ffffd001c8713ab0
[ ffffd001c8713ab0 _resume_from_idle+0x112() ]
ffffd001c8713ae0 swtch+0x141()
ffffd001c8713b20 cv_wait+0x70(ffffd14b827beae8, ffffd14b827beae0)
ffffd001c8713b70 smb_avl_destroy+0x5b(ffffd14b827beab0)
ffffd001c8713ba0 smb_export_stop+0x4e(ffffd14b827be540)
ffffd001c8713be0 smb_server_shutdown+0x144(ffffd14b827be540)
ffffd001c8713c20 smb_server_delete+0x1c7(ffffd14b827be540)
ffffd001c8713c60 smb_drv_close+0x3d(a200000000, 1, 2, ffffd14ca63fd9d8)
ffffd001c8713c90 dev_close+0x31(a200000000, 1, 2, ffffd14ca63fd9d8)
ffffd001c8713ce0 device_close+0xd8(ffffd14ca6816880, 1, ffffd14ca63fd9d8)
ffffd001c8713d70 spec_close+0x17b(ffffd14ca6816880, 1, 1, 0, ffffd14ca63fd9d8, 0)
ffffd001c8713df0 fop_close+0x61(ffffd14ca6816880, 1, 1, 0, ffffd14ca63fd9d8, 0)
ffffd001c8713e30 closef+0x5e(ffffd14ca63d7370)
ffffd001c8713ea0 closeandsetf+0x398(13, 0)
ffffd001c8713ec0 close+0x13(13)
ffffd001c8713f10 _sys_sysenter_post_swapgs+0x149()

> ffffd14b827beab0::print -at smb_avl_t
ffffd14b827beab0 smb_avl_t {
ffffd14b827beab0 krwlock_t avl_lock =
{ ffffd14b827beab0 void *[1] _opaque = [ 0 ] }
ffffd14b827beab8 avl_tree_t avl_tree =
{ ffffd14b827beab8 struct avl_node *avl_root = 0xffffd14e43b2b860 ffffd14b827beac0 int (*)() avl_compar = smb_kshare_cmp ffffd14b827beac8 size_t avl_offset = 0x8 ffffd14b827bead0 ulong_t avl_numnodes = 0x422 ffffd14b827bead8 size_t avl_size = 0xb8 }
ffffd14b827beae0 kmutex_t avl_mutex =
{ ffffd14b827beae0 void *[1] _opaque = [ 0 ] }
ffffd14b827beae8 kcondvar_t avl_cv =
{ ffffd14b827beae8 ushort_t _opaque = 0x1 }
ffffd14b827beaec smb_avl_state_t avl_state = 0x2 (SMB_AVL_STATE_DESTROYING)
ffffd14b827beaf0 uint32_t avl_refcnt = 0x2
ffffd14b827beaf4 uint32_t avl_sequence = 0x2e36190e
ffffd14b827beaf8 const smb_avl_nops_t *avl_nops = smb_kshare_avlops
Actions #2

Updated by Gordon Ross 6 months ago

The fix is to flush the delete queue of the tid_tree->t_ofile_list list after the ofile release in smb2_dh_import_share() so that the expected state transitions (STATE_SAVING -> STATE_ORPHANED) happen. Also, do not attempt to import handles if the server is stopping as there is no point importing them when the server is stopping.

Actions #3

Updated by Electric Monk 6 months ago

  • Gerrit CR set to 1547
Actions #4

Updated by Gordon Ross 5 months ago

Tested via fail-over / fail-back operations. No more hang.

Actions #5

Updated by Electric Monk 4 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit af7caad8ef15f00fc4dffc1b827b4292ff0db62c

commit  af7caad8ef15f00fc4dffc1b827b4292ff0db62c
Author: Prashanth Badari <prashanth.badari@tegile.com>
Date:   2021-07-15T19:42:57.000Z

    13873 SMB shutdown hang with CA handles
    Reviewed by: Suresh Jayaraman <sjayaraman@tintri.com>
    Reviewed by: Gordon Ross <gordon.ross@tintri.com>
    Reviewed by: Andrew Stormont <astormont@racktopsystems.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Garrett D'Amore <garrett@damore.org>

Actions

Also available in: Atom PDF