Project

General

Profile

Bug #3325

nsmb_close locking and teardown deadlock

Added by Gordon Ross about 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2012-10-31
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

One may observe the SMB server failing to authenticate domain users,
and upon examining stacks, one may find:

> ffffff2da452b560$<threadlist
            ADDR             PROC              LWP CLS PRI            WCHAN
ffffff2da452b560 ffffff2a6d324730 ffffff2aa3480e20   1  59 ffffff303ceb1dc4
  PC: _resume_from_idle+0xf1    CMD: /usr/lib/smbsrv/smbd start
  stack pointer for thread ffffff2da452b560: ffffff013557ea70
  [ ffffff013557ea70 _resume_from_idle+0xf1() ]
    swtch+0x145()
    cv_wait_sig+0x14d()
    smb_iod_reconnect+0x59()
    smb_usr_get_ssn+0x114()
    nsmb_ioctl+0x125()
    cdev_ioctl+0x45()
    spec_ioctl+0x5a()
    fop_ioctl+0x7b()
    ioctl+0x18e()
    dtrace_systrace_syscall32+0x11a()
    _sys_sysenter_post_swapgs+0x149()

Attempts (by root) to connect to the current AD server will also hang.

#1

Updated by Gordon Ross almost 8 years ago

The smbiod is in the middle of shutting down it's connection to the AD server, typically due to it being idle. During the nsmb_close() call, it holds the &dev_lck mutex while calling nsmb_close2(). That in turn creates a close request which mistakenly tries to initiate a reconnect to this server. The reconnect can not proceed because it needs to reenter the driver and take the lock that we already hold. Deadlock.

We're doing two things, either of which alone would fix this deadlock, both of which are improvements on their own:
(a) don't hold dev_lck during nsmb_close2
(b) don't trigger a reconnect for close or "tdis".

Part (b) is OK because the SMB protocol assures us that, once a connect is closed, all file IDs, tree IDs, etc are destroyed by the server. It never makes sense to reconnect just to do that.

#2

Updated by Gordon Ross almost 8 years ago

  • Status changed from New to Resolved
From 39f633a09e54fab2b9cf8d9d3ddc2a043b3e7465 Mon Sep 17 00:00:00 2001
From: Bayard Bell <bayard.bell@nexenta.com>
Date: Wed, 30 Jan 2013 23:35:38 +0100
Subject: [PATCH] 3325 nsmb_close locking and teardown deadlock
 Reviewed by: Albert Lee <trisk@nexenta.com>
 Reviewed by: Gordon Ross <gwr@nexenta.com>
 Reviewed by: Yakov Zaytsev <yakov@nexenta.com>
 Approved by: Richard Lowe <richlowe@richlowe.net>

Also available in: Atom PDF