Project

General

Profile

Bug #10976

svcadm disable network/smb/server may hang

Added by Gordon Ross 5 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
2019-05-14
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

History

#1

Updated by Gordon Ross 5 months ago

  • Status changed from New to In Progress

An attempt to restart the SMB service failed. The smbd process was stuck in the internal process teardown close routine, and the process could not be killed.

#2

Updated by Gordon Ross 5 months ago

Why could we not kill smbd?

> ::pgrep smbd
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R  10160      1  10160  10160      0 0x42300902 ffffff64b1558080 smbd

> ffffff64b1558080 ::walk thread | ::findstack -v
stack pointer for thread ffffff64ca5ae420: ffffff02e15e8a20
[ ffffff02e15e8a20 _resume_from_idle+0xf4() ]
  ffffff02e15e8a50 swtch+0x141()
  ffffff02e15e8a90 cv_wait+0x70(ffffff64d5eb7850, ffffff64d5eb7800)
  ffffff02e15e8ad0 taskq_wait+0xa3(ffffff64919da9e8)    <<<< stuck here
  ffffff02e15e8b20 taskq_destroy+0x6c(ffffff64919da9e8)
  ffffff02e15e8b50 smb_server_shutdown+0xde(ffffff64ca27e500)
  ffffff02e15e8b70 smb_server_stop+0x4d()
  ffffff02e15e8be0 smb_drv_ioctl+0x16b(d00000000, d346000a, 8047da0, 100001, 
  ffffff64a5aa64e0, ffffff02e15e8dc8)
  ffffff02e15e8c20 cdev_ioctl+0x39(d00000000, d346000a, 8047da0, 100001, 
  ffffff64a5aa64e0, ffffff02e15e8dc8)
  ffffff02e15e8c70 spec_ioctl+0x60(ffffff64cadec980, d346000a, 8047da0, 100001, 
  ffffff64a5aa64e0, ffffff02e15e8dc8, 0)
  ffffff02e15e8d00 fop_ioctl+0x55(ffffff64cadec980, d346000a, 8047da0, 100001, 
  ffffff64a5aa64e0, ffffff02e15e8dc8, 0)
  ffffff02e15e8e20 ioctl+0x9b(23, d346000a, 8047da0)
  ffffff02e15e8ec0 dtrace_systrace_syscall32+0xe4(23, d346000a, 8047da0, 
  ca5ae420, 1, 0)
  ffffff02e15e8f10 _sys_sysenter_post_swapgs+0x149()
stack pointer for thread ffffff6459956b80: ffffff02e19cacc0
[ ffffff02e19cacc0 _resume_from_idle+0xf4() ]
  ffffff02e19cacf0 swtch+0x141()
  ffffff02e19cad30 cv_wait+0x70(ffffff64b1558146, ffffff63e69ab640)
  ffffff02e19cad70 exitlwps+0x13c(0)
  ffffff02e19cae00 psig+0x477()
  ffffff02e19caec0 post_syscall+0x82d(ffffffff, ffffffff)
  ffffff02e19caf00 syscall_exit+0x68(ffffff6459956b80, ffffffff, ffffffff)
  ffffff02e19caf10 0xfffffffffb800ed9()

>  ffffff64919da9e8 ::taskq -t
ADDR             NAME                             ACT/THDS Q'ED  MAXQ INST
ffffff64919da9e8 smb_receivers                    200/ 201    0     -    -
    THREAD           STATE    SOBJ                COUNT
    ffffff02e0645c40 SLEEP    CV                    200
    ffffff02e550ac40 SLEEP    CV                      -
    [... 200 of these...]
    ffffff02e202cc40 SLEEP    CV                      -
    ffffff02e1721c40 SLEEP    CV                      -
                     swtch+0x141
                     cv_wait+0x70
                     smb_slist_wait_for_empty+0x3a
                     smb_session_cancel+0x32
                     smb_session_receiver+0xde
                     smb_server_receiver+0x28
                     taskq_d_thread+0xe6
                     thread_start+8

> ffffff02e0645c40 ::findstack -v     
stack pointer for thread ffffff02e0645c40: ffffff02e0645a20
[ ffffff02e0645a20 _resume_from_idle+0xf4() ]
  ffffff02e0645a50 swtch+0x141()
  ffffff02e0645a90 cv_wait+0x70(ffffff64a5a7deb8, ffffff64a5a7deb0)
  ffffff02e0645ac0 smb_slist_wait_for_empty+0x3a(ffffff64a5a7deb0)
  ffffff02e0645b00 smb_session_cancel+0x32(ffffff64a5a7daa8)
  ffffff02e0645b40 smb_session_receiver+0xde(ffffff64a5a7daa8)
  ffffff02e0645b70 smb_server_receiver+0x28(ffffff64440f4518)
  ffffff02e0645c20 taskq_d_thread+0xe6(ffffff64ac0ddd20)
  ffffff02e0645c30 thread_start+8()
> ffffff02e550ac40 ::findstack -v
stack pointer for thread ffffff02e550ac40: ffffff02e550aa20
[ ffffff02e550aa20 _resume_from_idle+0xf4() ]
  ffffff02e550aa50 swtch+0x141()
  ffffff02e550aa90 cv_wait+0x70(ffffff6488e96098, ffffff6488e96090)
  ffffff02e550aac0 smb_slist_wait_for_empty+0x3a(ffffff6488e96090)
  ffffff02e550ab00 smb_session_cancel+0x32(ffffff6488e95c88)
  ffffff02e550ab40 smb_session_receiver+0xde(ffffff6488e95c88)
  ffffff02e550ab70 smb_server_receiver+0x28(ffffff6476a668d8)
  ffffff02e550ac20 taskq_d_thread+0xe6(ffffff64545dc4e0)
  ffffff02e550ac30 thread_start+8()
> ffffff02e5438c40 ::findstack -v
stack pointer for thread ffffff02e5438c40: ffffff02e5438a20
[ ffffff02e5438a20 _resume_from_idle+0xf4() ]
  ffffff02e5438a50 swtch+0x141()
  ffffff02e5438a90 cv_wait+0x70(ffffff6488e95268, ffffff6488e95260)
  ffffff02e5438ac0 smb_slist_wait_for_empty+0x3a(ffffff6488e95260)
  ffffff02e5438b00 smb_session_cancel+0x32(ffffff6488e94e58)
  ffffff02e5438b40 smb_session_receiver+0xde(ffffff6488e94e58)
  ffffff02e5438b70 smb_server_receiver+0x28(ffffff6477a6fc70)
  ffffff02e5438c20 taskq_d_thread+0xe6(ffffff646aa72da8)
  ffffff02e5438c30 thread_start+8()
> ffffff02e540ec40 ::findstack -v
stack pointer for thread ffffff02e540ec40: ffffff02e540ea20
[ ffffff02e540ea20 _resume_from_idle+0xf4() ]
  ffffff02e540ea50 swtch+0x141()
  ffffff02e540ea90 cv_wait+0x70(ffffff646d43a968, ffffff646d43a960)
  ffffff02e540eac0 smb_slist_wait_for_empty+0x3a(ffffff646d43a960)
  ffffff02e540eb00 smb_session_cancel+0x32(ffffff646d43a558)
  ffffff02e540eb40 smb_session_receiver+0xde(ffffff646d43a558)
  ffffff02e540eb70 smb_server_receiver+0x28(ffffff6475c45038)
  ffffff02e540ec20 taskq_d_thread+0xe6(ffffff64545dc080)
  ffffff02e540ec30 thread_start+8()
> 

Wow, those taskq threads all seem to be stuck in slist_wait_for_empty

Let's look at the first session shown above:


> ffffff64a5a7daa8 ::smbsess -vu
SMB session information (ffffff64a5a7daa8): 
Client IP address: 10.214.16.232
Local IP Address: 10.214.60.50
Session KID: 378
Workstation Name: 
Session state: 1 (DISCONNECTED)
Session dialect: 0xb
Number of Users: 276
Number of Trees: 0
Number of Files: 0
Number of Shares: 0
Number of active Transact.: 0

  SMB user information (ffffff64a50579b8):
  UID: 1
  State: 0 (LOGGING_ON)
  Flags: 0x00000000
  Privileges: 0x00000000
  Credential: 0
  Reference Count: 0
  User Account: \

  SMB user information (ffffff64a5959448):
  UID: 2                              
  State: 0 (LOGGING_ON)
  Flags: 0x00000000
  Privileges: 0x00000000
  Credential: 0
  Reference Count: 0
  User Account: \

  SMB user information (ffffff64a45be300):
  UID: 3
  State: 0 (LOGGING_ON)
  Flags: 0x00000000
  Privileges: 0x00000000
  Credential: 0
  Reference Count: 0
  User Account: \

  SMB user information (ffffff64a45be540):
  UID: 4
  State: 0 (LOGGING_ON)
  Flags: 0x00000000
  Privileges: 0x00000000
  Credential: 0
  Reference Count: 0                  
  User Account: \

  SMB user information (ffffff64a45be4b0):
  UID: 5
  State: 0 (LOGGING_ON)
  Flags: 0x00000000
  Privileges: 0x00000000
  Credential: 0
  Reference Count: 0
  User Account: \

  SMB user information (ffffff64a45be6f0):
  UID: 6
  State: 0 (LOGGING_ON)
  Flags: 0x00000000
  Privileges: 0x00000000
  Credential: 0
  Reference Count: 0
  User Account: \

  SMB user information (ffffff64a45be810):
  UID: 7
  State: 0 (LOGGING_ON)               
  Flags: 0x00000000
  Privileges: 0x00000000
  Credential: 0
  Reference Count: 0
  User Account: \

  SMB user information (ffffff64a45be9c0):
  UID: 8
  State: 0 (LOGGING_ON)
  Flags: 0x00000000
  Privileges: 0x00000000
  Credential: 0
  Reference Count: 0
  User Account: \

  SMB user information (ffffff6487947d30):
  UID: 9
  State: 0 (LOGGING_ON)
  Flags: 0x00000000
  Privileges: 0x00000000
  Credential: 0
  Reference Count: 0
  User Account: \                     

  SMB user information (ffffff64a5057a48):
  UID: 10
  State: 0 (LOGGING_ON)
  Flags: 0x00000000
  Privileges: 0x00000000
  Credential: 0
  Reference Count: 0
  User Account: \

  SMB user information (ffffff64a5057028):
  UID: 11
  State: 0 (LOGGING_ON)
  Flags: 0x00000000
  Privileges: 0x00000000
  Credential: 0
  Reference Count: 0
  User Account: \

  SMB user information (ffffff64a5959a78):
  UID: 12
  State: 0 (LOGGING_ON)
>                                     

So lots of user sessions in "logging on" state.

We also see lots of smbworker threads blocked in smb_authsock_recv, i.e.:

stack pointer for thread ffffff02e39c5c40: ffffff02e39c54c0
[ ffffff02e39c54c0 _resume_from_idle+0xf4() ]
  ffffff02e39c54f0 swtch+0x141()
  ffffff02e39c5530 cv_wait+0x70(ffffff64ceccedf2, ffffff64d54fe5c0)
  ffffff02e39c55a0 cv_wait_sig+0x82(ffffff64ceccedf2, ffffff64d54fe5c0)
  ffffff02e39c55d0 str_cv_wait+0x27(ffffff64ceccedf2, ffffff64d54fe5c0, ffffffffffffffff, 0)
  ffffff02e39c5680 strwaitq+0x2c3(ffffff64d54fe540, 2, 8, 3, ffffffffffffffff, ffffff02e39c56c8)
  ffffff02e39c5720 strread+0x144(ffffff64cec78740, ffffff02e39c58b0, ffffff63e68fcdb0)
  ffffff02e39c5840 sotpi_recvmsg+0xb6(ffffff64cec173f8, ffffff02e39c58e0, ffffff02e39c58b0, ffffff63e68fcdb0)
  ffffff02e39c5880 socket_recvmsg+0x33(ffffff64cec173f8, ffffff02e39c58e0, ffffff02e39c58b0, ffffff63e68fcdb0)
  ffffff02e39c5970 ksocket_recv+0x15a(ffffff64cec173f8, ffffff02e39c5a38, 8, 40, ffffff02e39c5988, ffffff63e68fcdb0)
  ffffff02e39c59c0 smb_authsock_recv+0x55(ffffff64cec173f8, ffffff02e39c5a38, 8)
  ffffff02e39c5a20 smb_authsock_sendrecv+0x75(ffffff64cec173f8, ffffff02e39c5a38, ffffff648aea7028, ffffff02e39c5a30)
  ffffff02e39c5a70 smb_authenticate_ext+0xdb(ffffff649f257010)
  ffffff02e39c5aa0 smb_com_session_setup_andx+0xc6(ffffff649f257010)
  ffffff02e39c5b40 smb1sr_work+0x482(ffffff649f257010)
  ffffff02e39c5b70 smb_session_worker+0x78(ffffff649f257010)
  ffffff02e39c5c20 taskq_d_thread+0xe6(ffffff64859ebce0)
  ffffff02e39c5c30 thread_start+8()

That's the problem. The above thread ( smb_server_shutdown, taskq_destroy ) is waiting for these taskq threads to exit, and they never will until the auth sockets get closed, which won't happen until the smbd process teardown progresses far enough to close those sockets.

A reasonable fix for this would be to call smb_logoff on all the user objects that remain, which will close the auth sockets and cause those threads to return an error from the ksocket_recv calls.

#3

Updated by Gordon Ross 5 months ago

Testing: Arrange for smbd to be slow processing a logon, and shutdown the SMB service while that's happening.
In a debug build, one can set smbd_authsvc_slowdown (via mdb) to force smbd to take some time processing logons,
Fix in production since early 2016

#4

Updated by Electric Monk 5 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit b210fede5519ffcaa92e5409d891c77b945d73bb

commit  b210fede5519ffcaa92e5409d891c77b945d73bb
Author: Gordon Ross <gwr@nexenta.com>
Date:   2019-05-23T02:59:03.000Z

    10976 svcadm disable network/smb/server may hang
    Portions contributed by: Matt Barden <Matt.Barden@nexenta.com>
    Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
    Reviewed by: Evan Layton <evan.layton@nexenta.com>
    Reviewed by: Matt Barden <matt.barden@nexenta.com>
    Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
    Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
    Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF