Project

General

Profile

Actions

Bug #14795

open

SMB server unresponsive after updating

Added by Adam Stylinski 3 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
smb - SMB server and client
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

An update to a system from 2019 to revision da37308cbe seems to have caused the smb service to go into a state of panic and not accept no authentication requests. The smb service log was flooded with:

smbd: authsvc: flooded 88
smbd: authsvc: flooded 8
smbd: authsvc: flooded 90
smbd: authsvc: flooded 13
smbd: authsvc: flooded 8
smbd: authsvc: flooded 3
smbd: authsvc: flooded 200
smbd: authsvc: flooded 26
smbd: authsvc: flooded 37
smbd: authsvc: flooded 64
smbd: authsvc: flooded 10
smbd: authsvc: flooded 2
smbd: authsvc: flooded 2
smbd: authsvc: flooded 90
smbd: authsvc: flooded 8
smbd: authsvc: flooded 4
smbd: authsvc: flooded 65
smbd: authsvc: flooded 33
smbd: authsvc: flooded 2
smbd: authsvc: flooded 15

The only thing I could find with regard to this is this really old closed bug: https://www.illumos.org/issues/11834

Gordon, do you have any idea what might be causing this? Do you have any good debugging flags I can turn on to troubleshoot? I added you as a "watcher", hopefully that's not too intrusive and it just sends you a notification email.

Actions #1

Updated by Gordon Ross 3 months ago

As usual: Start with a network capture of port 445 (snoop -o something.snoop port 445)
then analyze with Wireshark to see what happens between server and client.
Next look with: dtrace -s /usr/lib/smbsrv/dtrace/smbsrv.d -o something.txt
and see if you can figure out where things go wrong. That shows the in-kernel stuff.
If that looks OK but the smb_authenticate calls are having problems, you can try:
dtrace -s /usr/lib/smbsrv/dtrace/smbd-authsvc.d -p $(pgrep smbd) -o something...
and see if you can figure out where things are going badly in the auth service.
Beyond that, you may need some custom dtrace to figure out the rest.

Actions #2

Updated by Adam Stylinski 3 months ago

If this happens again I'll be sure to give that a whirl (kicking over the smb server service made things ok again). Something suspicious about the state the of machine that could very well be at least partially to blame:

there seems to be 41000 some-odd sockets for nfsd in the CLOSE_WAIT state that won't go away. The DC reboot issue had happened a day or so ago and we use IDMU with idmap, so that does cause some pretty weird network / hanging behavior when the DC goes down and it doesn't know what to do (related to the other bug I have filed). However, this is the only system we've seen that has those hung TCP states.

I suspect it's possible having those many sockets held open could have caused the smb service to not be able to service authentication?

Actions

Also available in: Atom PDF