Actions
Bug #13873
closedSMB shutdown hang with CA handles
Start date:
Due date:
% Done:
100%
Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:
Description
SMB server shut down hang observed with threads stuck in oplock break:
THREAD STATE SOBJ COUNT ffffd001c956ec40 SLEEP CV 2 swtch+0x141 cv_wait+0x70 smb_ofile_hold_olbrk+0x7c smb_oplock_ind_break+0x80 smb_oplock_break_cmn+0x889 smb_oplock_break_OPEN+0x31 smb_common_open+0xdda smb2_dh_import_handle+0x35e smb2_dh_import_share+0x1a8 taskq_d_thread+0xb7 thread_start+8 ffffd14cf871e100 SLEEP CV 1 swtch+0x141 cv_wait+0x70 smb_avl_destroy+0x5b smb_export_stop+0x4e smb_server_shutdown+0x144 smb_server_delete+0x1c7 smb_drv_close+0x3d dev_close+0x31 device_close+0xd8 spec_close+0x17b fop_close+0x61 closef+0x5e closeandsetf+0x398 close+0x13 _sys_sysenter_post_swapgs+0x149 ffffd001c9562c40 SLEEP CV 1 swtch+0x141 cv_wait+0x70 smb_thread_continue_timedwait_locked+0x5d smb_thread_continue+0x2d smb_kshare_unexport_thread+0x40 smb_thread_entry_point+0x53 thread_start+8 > ::stacks -m smbsrv ffffd001c956ec40::thread -d ADDR DISPTIME BOUND PR ffffd001c956ec40 2da6e1 -1 0
Updated by Gordon Ross almost 2 years ago
Analysis:
SMB server shutdown is waiting for nodes and ofiles to go away, but those are held by the oplock threads which have gotten stuck.
> ffffd001c956ec40::findstack -v stack pointer for thread ffffd001c956ec40: ffffd001c956e6a0 [ ffffd001c956e6a0 _resume_from_idle+0x112() ] ffffd001c956e6d0 swtch+0x141() ffffd001c956e710 cv_wait+0x70(ffffd1779a845860, ffffd1779a845818) ffffd001c956e750 smb_ofile_hold_olbrk+0x7c(ffffd1779a8457e0) ffffd001c956e7c0 smb_oplock_ind_break+0x80(ffffd1779a8457e0, 3, 1, 0) ffffd001c956e860 smb_oplock_break_cmn+0x889(ffffd14da2a4f990, ffffd1779a845428, 100004) ffffd001c956e890 smb_oplock_break_OPEN+0x31(ffffd14da2a4f990, ffffd1779a845428, 12019f, 1) ffffd001c956ea10 smb_common_open+0xdda(ffffd14dba23e330) ffffd001c956eaf0 smb2_dh_import_handle+0x35e(ffffd14dba23e330, ffffd14db5e75980, abb6331871d78487) ffffd001c956eb90 smb2_dh_import_share+0x1a8(ffffd14dba23e330) ffffd001c956ec20 taskq_d_thread+0xb7(ffffd14cf8b859b0) ffffd001c956ec30 thread_start+8() ffffd14dba23e330 is addr of smb_request_t ffffd14da2a4f990 is addr of smb_node_t ffffd1779a845428 is addr of smb_ofile_t (for the file that is triggering open) ffffd1779a8457e0 is addr of smb_ofile_t (for the file to which oplock break is sent) > ffffd1779a8457e0::print -at smb_ofile_t f_tree ffffd1779a845840 smb_tree_t *f_tree = 0xffffd17991d08c30 0xffffd17991d08c30::print -at smb_tree_t .. ffffd17991d08c50 smb_tree_state_t t_state = 0 (SMB_TREE_STATE_CONNECTED) ffffd17991d08dd8 char [256] t_sharename = [ "proj1k_ss21" ] ffffd17991d08ed8 char [1024] t_resource = [ "/export/proj1k/ss21" ] ...
The thread that is doing shutdown seems to be waiting for avl_refcnt to go to 0 (but is it 2)
> ffffd14cf871e100::findstack -v stack pointer for thread ffffd14cf871e100: ffffd001c8713ab0 [ ffffd001c8713ab0 _resume_from_idle+0x112() ] ffffd001c8713ae0 swtch+0x141() ffffd001c8713b20 cv_wait+0x70(ffffd14b827beae8, ffffd14b827beae0) ffffd001c8713b70 smb_avl_destroy+0x5b(ffffd14b827beab0) ffffd001c8713ba0 smb_export_stop+0x4e(ffffd14b827be540) ffffd001c8713be0 smb_server_shutdown+0x144(ffffd14b827be540) ffffd001c8713c20 smb_server_delete+0x1c7(ffffd14b827be540) ffffd001c8713c60 smb_drv_close+0x3d(a200000000, 1, 2, ffffd14ca63fd9d8) ffffd001c8713c90 dev_close+0x31(a200000000, 1, 2, ffffd14ca63fd9d8) ffffd001c8713ce0 device_close+0xd8(ffffd14ca6816880, 1, ffffd14ca63fd9d8) ffffd001c8713d70 spec_close+0x17b(ffffd14ca6816880, 1, 1, 0, ffffd14ca63fd9d8, 0) ffffd001c8713df0 fop_close+0x61(ffffd14ca6816880, 1, 1, 0, ffffd14ca63fd9d8, 0) ffffd001c8713e30 closef+0x5e(ffffd14ca63d7370) ffffd001c8713ea0 closeandsetf+0x398(13, 0) ffffd001c8713ec0 close+0x13(13) ffffd001c8713f10 _sys_sysenter_post_swapgs+0x149() > ffffd14b827beab0::print -at smb_avl_t ffffd14b827beab0 smb_avl_t { ffffd14b827beab0 krwlock_t avl_lock = { ffffd14b827beab0 void *[1] _opaque = [ 0 ] } ffffd14b827beab8 avl_tree_t avl_tree = { ffffd14b827beab8 struct avl_node *avl_root = 0xffffd14e43b2b860 ffffd14b827beac0 int (*)() avl_compar = smb_kshare_cmp ffffd14b827beac8 size_t avl_offset = 0x8 ffffd14b827bead0 ulong_t avl_numnodes = 0x422 ffffd14b827bead8 size_t avl_size = 0xb8 } ffffd14b827beae0 kmutex_t avl_mutex = { ffffd14b827beae0 void *[1] _opaque = [ 0 ] } ffffd14b827beae8 kcondvar_t avl_cv = { ffffd14b827beae8 ushort_t _opaque = 0x1 } ffffd14b827beaec smb_avl_state_t avl_state = 0x2 (SMB_AVL_STATE_DESTROYING) ffffd14b827beaf0 uint32_t avl_refcnt = 0x2 ffffd14b827beaf4 uint32_t avl_sequence = 0x2e36190e ffffd14b827beaf8 const smb_avl_nops_t *avl_nops = smb_kshare_avlops
Updated by Gordon Ross almost 2 years ago
The fix is to flush the delete queue of the tid_tree->t_ofile_list list after the ofile release in smb2_dh_import_share() so that the expected state transitions (STATE_SAVING -> STATE_ORPHANED) happen. Also, do not attempt to import handles if the server is stopping as there is no point importing them when the server is stopping.
Updated by Gordon Ross over 1 year ago
Tested via fail-over / fail-back operations. No more hang.
Updated by Electric Monk over 1 year ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
git commit af7caad8ef15f00fc4dffc1b827b4292ff0db62c
commit af7caad8ef15f00fc4dffc1b827b4292ff0db62c Author: Prashanth Badari <prashanth.badari@tegile.com> Date: 2021-07-15T19:42:57.000Z 13873 SMB shutdown hang with CA handles Reviewed by: Suresh Jayaraman <sjayaraman@tintri.com> Reviewed by: Gordon Ross <gordon.ross@tintri.com> Reviewed by: Andrew Stormont <astormont@racktopsystems.com> Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Garrett D'Amore <garrett@damore.org>
Actions