Project

General

Profile

Actions

Bug #14988

open

smb: panic in smb_common_open after pool export

Added by Toomas Soome 7 days ago. Updated 6 days ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
cifs - CIFS server and client
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:
racktop:BSR-11507

Description

Seen in tests with a (forced) zpool export:

> ::status
debugging crash dump vmcore.0 (64-bit) from xxxx
operating system: nnnnnn (i86pc)
image uuid: uuuuuu
panic message: BAD TRAP: type=e (#pf Page fault) rp=fffff946166af340 addr=20 occurred in module "unix" due to a NULL pointer dereference
dump content: kernel pages only

> $C
fffff946166af460 mutex_enter+0xb()
fffff946166af5e0 smb_common_open+0x3f4(fffffd932175b028)
fffff946166af9a0 smb2_create+0x7d3(fffffd932175b028)
fffff946166afb20 smb2sr_work+0x58c(fffffd932175b028)
fffff946166afb60 smb2_tq_work+0x73(fffffd932175b028)
fffff946166afc00 taskq_d_thread+0xbc(fffffd673625aa60)
fffff946166afc10 thread_start+0xb()

That's actually a call to smb_node_relase(NULL).

From racktop: 4547ac2a35665a09d67ac317e83e3038583d2b49
BSR-11507 panic in smb_common_open after pool export

Actions #1

Updated by Gordon Ross 7 days ago

  • Description updated (diff)
  • Category set to cifs - CIFS server and client
  • Status changed from New to In Progress
  • Assignee set to Toomas Soome
Actions #2

Updated by Gordon Ross 7 days ago

This happens because after the (forced) zpool export (with no SMB "unshare"), all file system operations under the share root fail. There's one place in smb_common_open where the failure handling logic is incomplete. Actually it looks like this logic problem has been there for a long time, and went unnoticed because this error handling code path is not normally taken.
Specifically, see the "fnode" variable initialization in the PR.
Confirmed with dtrace, after the zpool export, the call to smb_node_getattr on line 442 returns EIO (and probably every file system operation returns EiO after the export). That exposed the logic error were we had set "fnode_held" but not "fnode".

Upstreaming this fix from racktop:
commit 7da3de6ca08cbd76e65e218d2a6335b71205fbaf
Author: Gordon Ross <>
Date: Thu Jun 2 19:18:26 2022 -0400

BSR-11507 panic in smb_common_open after pool export
Actions #3

Updated by Gordon Ross 7 days ago

Test method:

Mount some SMB Share in the pool that will be exported.
Disable the SMB server "unexport" thread as follows:
(Set the "sth_state" field to zero)

# mdb -kw
...
> ::smblist
SERVER           ZONE STATE                            
fffffe170b04f500 0    RUNNING                          
> fffffe170b04f500 ::print -a smb_server_t sv_export.e_unexport_thread
fffffe170b04faf0 sv_export.e_unexport_thread = {
    fffffe170b04faf0 sv_export.e_unexport_thread.sth_magic = 0x534d4254
    fffffe170b04faf4 sv_export.e_unexport_thread.sth_name = [ "smb_kshare_unexport" ]
    fffffe170b04fb14 sv_export.e_unexport_thread.sth_state = 1 (SMB_THREAD_STATE_RUNNING)
    fffffe170b04fb18 sv_export.e_unexport_thread.sth_th = 0xfffffe001f753c20
    fffffe170b04fb20 sv_export.e_unexport_thread.sth_did = 0x120e
    fffffe170b04fb28 sv_export.e_unexport_thread.sth_ep = smb_kshare_unexport_thread
    fffffe170b04fb30 sv_export.e_unexport_thread.sth_ep_arg = 0xfffffe170b04f500
    fffffe170b04fb38 sv_export.e_unexport_thread.sth_pri = 0x3c
    fffffe170b04fb3c sv_export.e_unexport_thread.sth_kill = 0 (0)
    fffffe170b04fb40 sv_export.e_unexport_thread.sth_mtx = {
        fffffe170b04fb40 _opaque = [ 0 ]
    }
    fffffe170b04fb48 sv_export.e_unexport_thread.sth_cv = {
        fffffe170b04fb48 _opaque = 0x1
    }
}
> 
> fffffe170b04fb14/X
0xfffffe170b04fb14:             1               
> fffffe170b04fb14/W0
0xfffffe170b04fb14:             0x1             =       0x0
> 

Forcefully export the zpool:
 zpool export -ff data 

Now in the SMB client that has the share mounted, try refreshing the directory listing.
eg. with the samba client, it should return an error after the zpool export.
smb: \> dir
NT_STATUS_INTERNAL_ERROR listing \*

Before this fix, the server would panic at this point.
Actions #4

Updated by Gordon Ross 7 days ago

  • Description updated (diff)
Actions #5

Updated by Toomas Soome 7 days ago

  • Description updated (diff)
Actions #6

Updated by Toomas Soome 7 days ago

  • Description updated (diff)
Actions #7

Updated by Toomas Soome 7 days ago

  • Description updated (diff)
Actions #8

Updated by Electric Monk 6 days ago

  • Gerrit CR set to 2376
Actions #9

Updated by Toomas Soome 6 days ago

  • External Bug set to racktop:BSR-11507
Actions

Also available in: Atom PDF