Project

General

Profile

Bug #10984

kernel share list fails to be updated after fs import

Added by Gordon Ross 5 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
2019-05-14
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

SMB share becomes unavailable after some failovers.
After couple of failovers (about 3) share import fails to update the in-kernel sharelist:

> ffffff4aa6ae3b00::smbshare
smb_kshare_t     name             path
ffffff4aa677bb18 c$               /var/smb/cvol
ffffff4aa677c388 IPC$             IPC$
ffffff4aa677b2a8 vss$             /var/smb/cvol/windows/system32/vss 

Steps to Reproduce:
1) Start copy big file from Win to NS
2) Wait 1-5 minutes
3) Do haservice move testpool nodea
4) Press "try again" and wait 1-5 minutes
5) Fail back haservice testpool nodea
6) Press "try again" and wait 1-5 minutes
7) Repeat

Expected Results:

smb_kshare_t     name             path
ffffff4aa67790e8 c$               /var/smb/cvol
ffffff4aa6779958 IPC$             IPC$
ffffff4aa6778008 kek_fs1          /kek/fs1
ffffff4aa6778878 vss$             /var/smb/cvol/windows/system32/vss

Actual Results:
root@hp-84:~# svc refresh smb
smb_kshare_t     name             path
ffffff4aa677bb18 c$               /var/smb/cvol
ffffff4aa677c388 IPC$             IPC$
ffffff4aa677b2a8 vss$             /var/smb/cvol/windows/system32/vss

History

#1

Updated by Gordon Ross 5 months ago

  • Description updated (diff)
  • Status changed from New to In Progress
#2

Updated by Gordon Ross 5 months ago

From Joyce McIntosh

Using a coming-soon new mdb dcmd ::smb_shr_cache, I can see that when the issue occurs it is because the share already exists in the smbd share cache of the node to which the service is being moved. Example:

service running on nodeA
move service to nodeB BUT entry remains in nodeA's smdb share cache
move service to nodeA, entry found in smbd share cache so not shared.

While the smb_shr_remove is being done, an smb_shr_add is being performed by another thread, as a result of SMF sending SIGHUP to trigger smbd config refresh - smbd_refresh_handler.

The refresh handler gets the list of shares from sharemgr. The problem is that at the time it does this the share has been removed from the smbd share cache but is still in sharemgr (updating sharemgr is the last step in the removal operation) .

The final solution was to not load the shares as part of refresh, but still do the part of smb_shr_load that gets the updated execinfo config parameters by adding an smb_shr_load_execinfo function.

#3

Updated by Gordon Ross 5 months ago

Testing: repeated fail-over (per description)
In production since mid 2018

#4

Updated by Electric Monk 5 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit c720f3a7cb73a788632c8431403e7bf9fd70717e

commit  c720f3a7cb73a788632c8431403e7bf9fd70717e
Author: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Date:   2019-05-28T17:34:10.000Z

    10984 kernel share list fails to be updated after fs import
    Reviewed by: Matt Barden <matt.barden@nexenta.com>
    Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
    Reviewed by: Yuri Pankov <yuripv@yuripv.net>
    Approved by: Garrett D'Amore <garrett@damore.org>

Also available in: Atom PDF