smb/server is stuck in STOPPING state after svcadm restart
While testing #11831 saw smb/server hang.
Looking with mdb:
> ::smblist SERVER ZONE STATE ffffff037ab25040 0 STOPPING SESSION IP_ADDR PORT DIALECT STATE ffffff037ab4c958 ::ffff:192.168.9.131 42846 0x302 SHUTDOWN > ::stacks -m smbsrv THREAD STATE SOBJ COUNT ffffff000f196c40 SLEEP CV 1 swtch+0x141 cv_wait+0x70 cv_wait_sig+0x82 so_dequeue_msg+0x2f7 so_recvmsg+0x249 socket_recvmsg+0x33 ksocket_recv+0x15a smb_sorecv+0x39 smb_session_xprt_gethdr+0x33 smb_session_reader+0x36 smb_session_receiver+0x54 smb_server_receiver+0x18 taskq_d_thread+0xb7 thread_start+8 ffffff03743ea140 SLEEP CV 1 swtch+0x141 cv_wait+0x70 taskq_wait+0xa3 taskq_destroy+0x6c smb_server_shutdown+0x1c0 smb_server_stop+0x70 smb_drv_ioctl+0x155 cdev_ioctl+0x39 spec_ioctl+0x60 fop_ioctl+0x55 ioctl+0x9b sys_syscall32+0xff > ::smblist SERVER ZONE STATE ffffff037ab25040 0 STOPPING SESSION IP_ADDR PORT DIALECT STATE ffffff037ab4c958 ::ffff:192.168.9.131 42846 0x302 SHUTDOWN
Updated by Gordon Ross over 2 years ago
It turns out smb_user_delete (via smb_user_logoff) is racing with smb_session_disconnect, both of which can currently transition the session state to SMB_SESSION_STATE_SHUTDOWN.
When smb_user_delete happens first, it changes the session state to SHUTDOWN without having closed the socket.
In retrospect, my previous change to do that in smb_user_delete was ill-considered,
particularly because it sets that state without closing the session socket.
For the fix, I've added a timeout mechanism for session objects.
When a session has no user objects, we'll have a timeout running, which when it expires will call smb_session_disconnect, which causes the smb_session_receiver thread to destroy the session.
The session timeout is scheduled when:
the session is first created (no users yet)
when last user goes away (user count goes 1 to 0)
The session timeout is cancelled when:
the first user is created (user count goes 0 to 1)
when the reader thread begins termination
Updated by Gordon Ross over 2 years ago
Testing: This is tricky to reproduce. We first need to arrange for the existence of an smb_user_t object in state LOGGING_ON and making it stay in that state for a while. The only easy way I know of to do that is run a client under debug and put a breakpoint after the first SMB2_SESSION_SETUP command, leaving the client stopped in the debugger at that point. Check that we have an smb_user_t object in state LOGGING_ON by examining the SMB server state with mdb -k (::smblist).
Once we have a session in state LOGGING_ON, do "svcadm restart network/smb/server"
It might take a couple tries to see the shutdown hang.
Use the "fksmbd" client under debug so one can easily pause client activity at arbitrary points during session setup and tear-down. I used that for the following test cases:
- Let the client connect, then stop before it sends SMB2 Negotiate.
Verify (with mdb -k) that the SMB server has a client in state ESTABLISHED
and that the client session goes away in 30 sec.
- Let the client connect and negotiate, stopping before the first session setup
Verify that the SMB server has a client in state NEGOTIATED and that
the client session goes away in 30 sec.
- Let the client connect, negotiate, and send the 1st session setup,
stopping before the 2nd session setup (the NTLMSSP authenticate)
Verify that the SMB server has a client session in state NEGOTIATED
and a user session in state LOGGING_ON. Verify that after 45 sec.
the that user session goes away. Verify that after 30 sec. more,
the client session also goes away.
- Let the client connect and authenticate. Verify normal function
(e.g. the tree connect "shares" commands) Use the "logoff" command
to drop the user logon. Verify that 30 sec after the SMB logoff happens
(which the client does about 15 sec. after the "logoff" command)
the SMB server destroys the client session.
Updated by Electric Monk over 2 years ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
commit 817fa55f6c07cd26e2de797b63ac4695d57108ab Author: Gordon Ross <email@example.com> Date: 2019-10-19T13:02:21.000Z 11832 smb/server is stuck in STOPPING state after svcadm restart Reviewed by: Matt Barden <firstname.lastname@example.org> Reviewed by: Evan Layton <email@example.com> Reviewed by: Andy Stormont <AStormont@racktopsystems.com> Approved by: Robert Mustacchi <firstname.lastname@example.org>