Project

General

Profile

Actions

Bug #14142

closed

kernel SMB spams log when it hits max_connections

Added by Alex Wilson over 1 year ago. Updated 9 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:
racktop:BSR-11529

Description

We've got a box which is using kernel SMB which seems to hang occasionally. When it hangs, the kernel is still responding to ping, typed characters on serial console do echo, but userland is unresponsive, both on serial console (login prompts don't work at all) and over the network (TCP connection attempts just hang and time out).

NMI on this machine works fine normally to trigger a dump or enter kmdb -- the tweakables to panic on NMI are set in /etc/system. However, in this hang situation when we issue an NMI this happens:

WARNING: SMB Session: taskq_dispatch failed
WARNING: SMB Session: taskq_dispatch failed
WARNING: SMB Session: taskq_dispatch failed
WARNING: SMB Session: taskq_dispatch failed
WARNING: SMB Session: taskq_dispatch fail...

The system prints this to the serial console at maximum baud rate, apparently forever. A second NMI never seems to do anything to it, either, and we resort to a hard reset.

Actions #1

Updated by Alex Wilson over 1 year ago

  • Description updated (diff)
Actions #2

Updated by Alex Wilson over 1 year ago

  • Description updated (diff)
Actions #3

Updated by Gordon Ross over 1 year ago

This is a cmn_err(CE_WARN, ...) call here:
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/smbsrv/smb_server.c?r=4e065a9f#2494

The system must be receiving a flood of connections. (Under attack?)
Why would all of these be going to the console? I don't think that's normal.

Actions #4

Updated by Gordon Ross over 1 year ago

Have you seen that more than once? Any clues how to reproduce it?

Actions #5

Updated by Gordon Ross 12 months ago

  • Subject changed from kernel SMB stops NMI from inducing panic, printfs as hard and fast as it can instead to kernel SMB spams log when it hits max_connections

This happens when the SMB server has created too many sessions, and runs into max_connections.

Actions #6

Updated by Gordon Ross 12 months ago

There are actually a couple problems here.
1: Checking for max_connections should happen earlier, before we try to allocate a taskq thread.
2: The error handling code if/when taskq_dispatch fails was missing a call to smb_session_logoff,
which is necessary to allow the tear-down to proceed smoothly.

This can be tested by setting max_connections to a small value (I used 3) and then make enough connections to run over that limit. In the updated code, the first error handling code path can be tested with that configuration. To test the second error handling code path, use "mdb \-kw" to increase smb_server\->sv_cfg.skc_maxconnections by one and then repeat the attempt to add client connections. Both cases should immediately disconnect the client. After both error code paths are exercised, make sure the SMB service can restart without problems.

Actions #7

Updated by Gordon Ross 12 months ago

  • Status changed from New to In Progress
  • Assignee set to Gordon Ross
Actions #8

Updated by Electric Monk 12 months ago

  • Gerrit CR set to 2175
Actions #9

Updated by Joshua M. Clulow 12 months ago

My question:

While this fix does seem like an improvement to the code, do you have
any idea about the interaction with the NMI/panic described in bug?

Gordon's answer:

I believe it was simply that their system was too busy logging stuff
to the console to respond to the NMI.
I did not actually reproduce that problem, though I think that's
possible with any console spammer.

And I forgot my test results, which were: I repeated the procedures
described in the issue, verifying that we get the messages at no more
than once per minute, and that service restart works.

Actions #10

Updated by Electric Monk 12 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 61b20185b3a9f12c5f69672abe47b79dfb002cab

commit  61b20185b3a9f12c5f69672abe47b79dfb002cab
Author: Gordon Ross <gwr@racktopsystems.com>
Date:   2022-06-09T15:00:11.000Z

    14142 kernel SMB spams log when it hits max_connections
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Reviewed by: Albert Lee <alee@racktopsystems.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Matt Barden <mbarden@tintri.com>
    Approved by: Joshua M. Clulow <josh@sysmgr.org>

Actions #11

Updated by Gordon Ross 9 months ago

  • External Bug set to racktop:BSR-11529
Actions

Also available in: Atom PDF