Bug #13514
closedsmbsrv panic in smb2_dh_read_nvlist
100%
Description
panic after failover with a pool under user payload from 20 Windows10 VMs on Hyper-V.
# mdb -k 0 Loading modules: [ ... ] > ::status debugging crash dump vmcore.0 (64-bit) from x11-61 operating system: 5.11 ...:... (i86pc) image uuid: ... panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffc102e50f57c0 addr=38 occurred in module "smbsrv" due to a NULL pointer dereference dump content: kernel pages only > ::stack smb2_dh_read_nvlist+0x1cb(ffffc19944fcfd08, ffffc19765eb7640, ffffc102e50f5a70) smb2_dh_import_handle+0x94(ffffc19944fcfd08, ffffc19765eb7640, e7b948ca4ed761bb) smb2_dh_import_share+0x1a8(ffffc19944fcfd08) taskq_d_thread+0xb7(ffffc195ef919f18) thread_start+8()
Steps to Reproduce:
1) Configure cluster with 2 pools (one for iscsi lun - for VMs storage, the second for NFS and SMB shares)
2) Run VDBench payload from VMs (Ubuntu18 over NFS v3 and v4; Windows10 over SMB 3.02)
3) After some time perform manual failovers of a pool with shares.
Expected Results:
All wrks fine
Actual Results:
panic
Updated by Gordon Ross over 1 year ago
- Status changed from New to In Progress
- Assignee set to Gordon Ross
Trying to print an error right after the nvlist_unpack call.
Let's look at: shr = sr->arg.tcon
> ffffc19808d45018 ::smbreq REQUEST MSG_ID WORKER STATE COMMAND ffffc19808d45018 0x0 0 ACTIVE SMB_COM_CREATE_DIRECTORY > ffffc19808d45018 ::print smb_request_t arg.tcon arg.tcon = { arg.tcon.name = 0 arg.tcon.path = 0 arg.tcon.service = 0 arg.tcon.pwdlen = 0 arg.tcon.password = 0 arg.tcon.flags = 0 arg.tcon.optional_support = 0 arg.tcon.si = 0 }
Uh oh, it's all zero.
I don't think we've ever tested the error handling code path
where we have an nvlist file for which import fails.
The sr->arg member is cleared in smb2_dh_import_handle.
Need a different way to pass that in.
This is probably reproducible by importing/sharing any
ZFS dataset that contains a corrupted $CA file.
These are just error printing statements.
We can use tree->t_resource instead.
Updated by Gordon Ross over 1 year ago
Testing:
Create a persistent handle storage file (and nvlist) that's intentionally corrupt.
Stop and start the SMB server so that it will import persistent handles.
Before the fix that panics, after it ignores the corrupt file.
Updated by Electric Monk over 1 year ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
git commit 174aa483b26ab13af096f2d478f7c15afdaf9784
commit 174aa483b26ab13af096f2d478f7c15afdaf9784 Author: Gordon Ross <gwr@nexenta.com> Date: 2021-02-20T20:36:57.000Z 13514 smbsrv panic in smb2_dh_read_nvlist Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com> Reviewed by: Evan Layton <evan.layton@nexenta.com> Reviewed by: Matt Barden <matt.barden@nexenta.com> Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Andy Fiddaman <andy@omnios.org> Reviewed by: Paul Winder <paul@winder.uk.net> Approved by: Robert Mustacchi <rm@fingolfin.org>