Project

General

Profile

Actions

Bug #5133

closed

fix races for two panics on NULL pointer derefs in nsmb connection teardown

Added by Dan McDonald over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
cifs - CIFS server and client
Start date:
2014-09-05
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

In both cases an smbiod worker process triggers the panic with system calls against the nsmb (SMB client) kernel module as part of connection teardown.

SUP-538 summary:

This was a race between:
(a) the smbiod closing it's connection, leading to an "unloan" of the network socket, and
(b) other driver functions calling nb_disconnect(), i.e. as a result of network errors.

The root cause here is that nb_disconnect() can free nbp->nbp_frag out from under the smbiod receiver thread for lack of synchronization of the free with the state change, so our cleanest fix is to move the freemsg() to nb_unloan_fp(), which is synchronized with the state in nbp->nbp_flags.

Panic signature:

> ::status
[snip]
panic message:
BAD TRAP: type=e (#pf Page fault) rp=ffffff00b9ec78d0 addr=18 occurred in module "nsmb" due to a NULL pointer dereference
dump content: kernel pages and pages from PID -1
> ::stack
nbssn_recv+0xe8(ffffff3a01535e18, ffffff00b9ec7a88, ffffff00b9ec7a38, ffffff00b9ec7a3f)
smb_nbst_recv+0x62(ffffff6c5ff78080, ffffff00b9ec7a88)
smb_iod_recv1+0x3b(ffffff6c5ff78080, ffffff00b9ec7ae8)
smb_iod_recvall+0x65(ffffff6c5ff78080)
smb_iod_vc_work+0xa0(ffffff6c5ff78080, ffffff19b0fa5190)
smb_usr_iod_work+0xf4(ffffff19f25e44c8, 80b79f0, 100003, ffffff19b0fa5190)
nsmb_ioctl+0xd5(1000001d2ca, 6e7313, 80b79f0, 100003, ffffff19b0fa5190, ffffff00b9ec7d54)
cdev_ioctl+0x45(1000001d2ca, 6e7313, 80b79f0, 100003, ffffff19b0fa5190, ffffff00b9ec7d54)
spec_ioctl+0x5a(ffffff598484ba40, 6e7313, 80b79f0, 100003, ffffff19b0fa5190, ffffff00b9ec7d54)
fop_ioctl+0x7b(ffffff598484ba40, 6e7313, 80b79f0, 100003, ffffff19b0fa5190, ffffff00b9ec7d54)
ioctl+0x18e(6, 6e7313, 80b79f0)
dtrace_systrace_syscall32+0x11a(6, 6e7313, 80b79f0, 1c3, 1, 0)
_sys_sysenter_post_swapgs+0x149()

This can also exhibit in nbssn_peekhdr() and pullupmsg() called by the receiver thread. Examination of nbp->nbp_frag in the stack trace shows it to be NULL, while NBF_SENDLOCK is set in nbp->nbp_flags.

SUP-548 summary:
smb_iod_disconnect() checks whether vcp->iod_thr is NULL at smb_iod.c line 188, immediately before calling tsignal(vcp->iod_thr, SIGKILL) at line 190. Code that changes the pointer value (e.g. smb_usr_iod_work()) acquires/surrenders the necessary locks by calling SMB_VC_LOCK/SMB_VC_UNLOCK (see smb_usr.c lines 852-857). It appears that this code needs to do the same to synchronise without racing off to panic.

Signature:

> ::status
[snip]
operating system: 5.11 NexentaOS_134f (i86pc)
panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff0017d17a00 addr=180 occurred in module "genunix" due to a NULL pointer dereference
dump content: kernel pages and pages from PID -1
> ::stack
tsignal+0x1e(0, 9)
smb_iod_disconnect+0x76(ffffff0700f0b700)
nsmb_close2+0x4f(ffffff066c2e41d8, ffffff042180a698)
nsmb_close+0x4f(10000003131, 3, 2, ffffff042180a698)
dev_close+0x3d(10000003131, 3, 2, ffffff042180a698)
device_close+0xa2(ffffff040d8ac180, 3, ffffff042180a698)
spec_close+0x163(ffffff040d8ac180, 3, 1, 0, ffffff042180a698, 0)
fop_close+0x71(ffffff040d8ac180, 3, 1, 0, ffffff042180a698, 0)
closef+0x5d(ffffff0414fc7b28)
closeandsetf+0x507(c, 0)
close+0x18(c)
dtrace_systrace_syscall32+0x11a(c, 0, 0, 1c3, 1, 0)
_sys_sysenter_post_swapgs+0x149()

Stack trace makes it very apparent that vcp->iod_thr has been set to NULL in race with tsignal() call.

Subsequent experience has shown that these issues exhibits in more recent illumos-gate shipped by OmniTI.

Actions

Also available in: Atom PDF