Project

General

Profile

Actions

Bug #13850

closed

SMB session logoff stuck in smb_ofile_hold_olbrk

Added by Gordon Ross 8 months ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
cifs - CIFS server and client
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Customer reported a user who could not connect.
A look at the system shows their reconnect attempts failing.
The SMB service was stuck with this stack:

> ffffff00f93f2c40 ::findstack -v
stack pointer for thread ffffff00f93f2c40: ffffff00f93f2620
[ ffffff00f93f2620 _resume_from_idle+0x126() ]
  ffffff00f93f2650 swtch+0x141()
  ffffff00f93f2690 cv_wait+0x70(ffffff7a89810848, ffffff7a89810800)
  ffffff00f93f26d0 smb_ofile_hold_olbrk+0x7c(ffffff7a898107c8)
  ffffff00f93f2720 smb2_lease_ofile_close+0x9b(ffffff7a8991cb80)
  ffffff00f93f2780 smb_ofile_close+0x1e0(ffffff7a8991cb80, 0)
  ffffff00f93f27b0 smb_ofile_drop+0x70(ffffff7a8991cb80)
  ffffff00f93f27f0 smb_llist_flush+0x5e(ffffff323f854408)
  ffffff00f93f2820 smb_llist_exit+0x20(ffffff323f854408)
  ffffff00f93f2860 smb_ofile_close_all+0xa0(ffffff323f8543b8, 0)
  ffffff00f93f28f0 smb_tree_disconnect+0x59(ffffff323f8543b8, 1)
  ffffff00f93f2920 smb_session_tree_dtor+0x1d(ffffff323f8543b8)
  ffffff00f93f2960 smb_llist_flush+0x5e(ffffff32d92bf8e0)
  ffffff00f93f2990 smb_llist_exit+0x20(ffffff32d92bf8e0)
  ffffff00f93f29d0 smb_session_disconnect_owned_trees+0x79(ffffff32d92bf398, ffffff325ac34780)
  ffffff00f93f2a10 smb_user_logoff+0xb3(ffffff325ac34780)
  ffffff00f93f2aa0 smb2_session_setup+0x2a4(ffffff32d764c6c0)
  ffffff00f93f2b50 smb2sr_work+0x356(ffffff32d764c6c0)
  ffffff00f93f2b90 smb2_tq_work+0x73(ffffff32d764c6c0)
  ffffff00f93f2c20 taskq_d_thread+0xb7(ffffff3ce59ecba0)
  ffffff00f93f2c30 thread_start+8()


Related issues

Related to illumos gate - Bug #13515: panic with bad mutex in smb_ofile_hold_olbrkClosedGordon Ross

Actions
Actions #1

Updated by Gordon Ross 8 months ago

  • Status changed from New to In Progress

We can see from the stack trace that an SMB2 session setup request
is doing session tear-down for it's "previous session".

That involves walking the ofiles in that session and either
closing them or making them "orphaned" so they can be reconnected.
Unfortunately, while trying to close one of them, we got stuck in
smb2_lease_ofile_close / smb_ofile_hold_olbrk which is trying to
move the lease to another ofile. That has gotten stuck waiting
for the other ofile to finish it's transition from state
"saving" to state "orphaned", which won't complete because
this same thread is in smb_llist_flush for t_ofile_list,
and there's no way for that transition to complete.

The two ofiles involved can be seen in the stack trace.
We're trying to move the lease from: ffffff7a8991cb80
to: ffffff7a898107c8 (which is stuck in state "saving").

This situation makes it evident that it's not safe to call
smb_ofile_hold_olbrk from the smb_llist_flush handler,
and probably also not while we're a reader of that llist.

This can occur after #13515

Actions #2

Updated by Electric Monk 8 months ago

  • Gerrit CR set to 1530
Actions #3

Updated by Gordon Ross 8 months ago

Testing:

We have a test program that can trigger this fault, and I used that to verify the fault no longer appears.
This has also been in the field for several months.

Actions #4

Updated by Electric Monk 8 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit a9931e68d716928f41c32ca936a443c797116f9c

commit  a9931e68d716928f41c32ca936a443c797116f9c
Author: Gordon Ross <gordon.ross@tintri.com>
Date:   2021-06-07T22:01:56.000Z

    13850 SMB session logoff stuck in smb_ofile_hold_olbrk
    Reviewed by: Evan Layton <elayton@tintri.com>
    Reviewed by: Matt Barden <mbarden@tintri.com>
    Reviewed by: Alexander Stetsenko <alex.stetsenko@gmail.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Actions #5

Updated by Gordon Ross 5 months ago

  • Related to Bug #13515: panic with bad mutex in smb_ofile_hold_olbrk added
Actions

Also available in: Atom PDF