Project

General

Profile

Actions

Bug #11832

closed

smb/server is stuck in STOPPING state after svcadm restart

Added by Gordon Ross almost 4 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

While testing #11831 saw smb/server hang.
Looking with mdb:

> ::smblist
SERVER           ZONE STATE
ffffff037ab25040 0    STOPPING
  SESSION          IP_ADDR                PORT     DIALECT  STATE
  ffffff037ab4c958 ::ffff:192.168.9.131   42846    0x302    SHUTDOWN
> ::stacks -m smbsrv
THREAD           STATE    SOBJ                COUNT
ffffff000f196c40 SLEEP    CV                      1
                 swtch+0x141
                 cv_wait+0x70
                 cv_wait_sig+0x82
                 so_dequeue_msg+0x2f7
                 so_recvmsg+0x249
                 socket_recvmsg+0x33
                 ksocket_recv+0x15a
                 smb_sorecv+0x39
                 smb_session_xprt_gethdr+0x33
                 smb_session_reader+0x36
                 smb_session_receiver+0x54
                 smb_server_receiver+0x18
                 taskq_d_thread+0xb7
                 thread_start+8

ffffff03743ea140 SLEEP    CV                      1
                 swtch+0x141
                 cv_wait+0x70
                 taskq_wait+0xa3
                 taskq_destroy+0x6c
                 smb_server_shutdown+0x1c0
                 smb_server_stop+0x70
                 smb_drv_ioctl+0x155
                 cdev_ioctl+0x39
                 spec_ioctl+0x60
                 fop_ioctl+0x55
                 ioctl+0x9b
                 sys_syscall32+0xff

> ::smblist
SERVER           ZONE STATE
ffffff037ab25040 0    STOPPING
  SESSION          IP_ADDR                PORT     DIALECT  STATE
  ffffff037ab4c958 ::ffff:192.168.9.131   42846    0x302    SHUTDOWN

Actions #1

Updated by Gordon Ross almost 4 years ago

It turns out smb_user_delete (via smb_user_logoff) is racing with smb_session_disconnect, both of which can currently transition the session state to SMB_SESSION_STATE_SHUTDOWN.

When smb_user_delete happens first, it changes the session state to SHUTDOWN without having closed the socket.

In retrospect, my previous change to do that in smb_user_delete was ill-considered,
particularly because it sets that state without closing the session socket.

For the fix, I've added a timeout mechanism for session objects.
When a session has no user objects, we'll have a timeout running, which when it expires will call smb_session_disconnect, which causes the smb_session_receiver thread to destroy the session.

The session timeout is scheduled when:

the session is first created (no users yet)
when last user goes away (user count goes 1 to 0)
The session timeout is cancelled when:

the first user is created (user count goes 0 to 1)
when the reader thread begins termination

Actions #2

Updated by Gordon Ross almost 4 years ago

Testing: This is tricky to reproduce. We first need to arrange for the existence of an smb_user_t object in state LOGGING_ON and making it stay in that state for a while. The only easy way I know of to do that is run a client under debug and put a breakpoint after the first SMB2_SESSION_SETUP command, leaving the client stopped in the debugger at that point. Check that we have an smb_user_t object in state LOGGING_ON by examining the SMB server state with mdb -k (::smblist).
Once we have a session in state LOGGING_ON, do "svcadm restart network/smb/server"
It might take a couple tries to see the shutdown hang.

Additional testing:
Use the "fksmbd" client under debug so one can easily pause client activity at arbitrary points during session setup and tear-down. I used that for the following test cases:

  • Let the client connect, then stop before it sends SMB2 Negotiate.
    Verify (with mdb -k) that the SMB server has a client in state ESTABLISHED
    and that the client session goes away in 30 sec.
  • Let the client connect and negotiate, stopping before the first session setup
    Verify that the SMB server has a client in state NEGOTIATED and that
    the client session goes away in 30 sec.
  • Let the client connect, negotiate, and send the 1st session setup,
    stopping before the 2nd session setup (the NTLMSSP authenticate)
    Verify that the SMB server has a client session in state NEGOTIATED
    and a user session in state LOGGING_ON. Verify that after 45 sec.
    the that user session goes away. Verify that after 30 sec. more,
    the client session also goes away.
  • Let the client connect and authenticate. Verify normal function
    (e.g. the tree connect "shares" commands) Use the "logoff" command
    to drop the user logon. Verify that 30 sec after the SMB logoff happens
    (which the client does about 15 sec. after the "logoff" command)
    the SMB server destroys the client session.
Actions #3

Updated by Gordon Ross almost 4 years ago

Tested per. description above.

Actions #4

Updated by Electric Monk almost 4 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 817fa55f6c07cd26e2de797b63ac4695d57108ab

commit  817fa55f6c07cd26e2de797b63ac4695d57108ab
Author: Gordon Ross <gwr@nexenta.com>
Date:   2019-10-19T13:02:21.000Z

    11832 smb/server is stuck in STOPPING state after svcadm restart
    Reviewed by: Matt Barden <matt.barden@nexenta.com>
    Reviewed by: Evan Layton <evan.layton@nexenta.com>
    Reviewed by: Andy Stormont <AStormont@racktopsystems.com>
    Approved by: Robert Mustacchi <rm@fingolfin.org>

Actions

Also available in: Atom PDF