Bug #875
closedmissing lwp_exit() in kcfpool_svc() induces panic in prchoose()
100%
Description
We have recently seen a spate of panics in prchoose():
> $c
prchoose+0x72(ffffff0d3761e008)
prgetpsinfo32+0x2b(ffffff0d3761e008, ffffff006a372b00)
pr_read_psinfo_32+0x4e(ffffff0d420b8640, ffffff006a372e20)
prread+0x5c(ffffff0d420b4c80, ffffff006a372e20, 0, ffffff0d52489480, 0)
fop_read+0xc9(ffffff0d420b4c80, ffffff006a372e20, 0, ffffff0d52489480, 0)
read+0x2b8(4, 8047af0, 150)
read32+0x22(4, 8047af0, 150)
_sys_sysenter_post_swapgs+0x149()
We seem to be dying on a stale p_tlist, which should generally be impossible. Interestingly, the proc_t in question is always kcfpoold:
> ffffff0d3761e008::ps
S PID PPID PGID SID UID FLAGS ADDR NAME
R 4 0 0 0 0 0x00020001 ffffff0d3761e008 kcfpoold
And indeed, kcfpool and the kcfpoold proc_t have wildly divergent ideas of how many threads are associated with kcfpoold:
@> *kcfpool::print kcf_pool_t kp_threads
kp_threads = 0x1
::pgrep kcfpoold | ::print proc_t p_lwpcnt
p_lwpcnt = 0x11@
The problem appears to be in kcfpool_svc(), which is what the in-kernel (i.e., synthetic) kcfpoold sets its LWPs to run: this routine simply returns when the size of the thread pool exceeds available work but it in fact needs to grab its own p_lock and call lwp_exit(), lest the LWP state associated with the process become stale.
Files
Updated by Bryan Cantrill over 12 years ago
- Assignee set to Garrett D'Amore
Bryan Cantrill wrote:
We have recently seen a spate of panics in prchoose():
> $c
prchoose+0x72(ffffff0d3761e008)
prgetpsinfo32+0x2b(ffffff0d3761e008, ffffff006a372b00)
pr_read_psinfo_32+0x4e(ffffff0d420b8640, ffffff006a372e20)
prread+0x5c(ffffff0d420b4c80, ffffff006a372e20, 0, ffffff0d52489480, 0)
fop_read+0xc9(ffffff0d420b4c80, ffffff006a372e20, 0, ffffff0d52489480, 0)
read+0x2b8(4, 8047af0, 150)
read32+0x22(4, 8047af0, 150)
_sys_sysenter_post_swapgs+0x149()We seem to be dying on a stale p_tlist, which should generally be impossible. Interestingly, the proc_t in question is always kcfpoold:
> ffffff0d3761e008::ps
S PID PPID PGID SID UID FLAGS ADDR NAME
R 4 0 0 0 0 0x00020001 ffffff0d3761e008 kcfpooldAnd indeed, kcfpool and the kcfpoold proc_t have wildly divergent ideas of how many threads are associated with kcfpoold:
@> *kcfpool::print kcf_pool_t kp_threads
kp_threads = 0x1::pgrep kcfpoold | ::print proc_t p_lwpcnt
p_lwpcnt = 0x11@
The problem appears to be in kcfpool_svc(), which is what the in-kernel (i.e., synthetic) kcfpoold sets its LWPs to run: this routine simply returns when the size of the thread pool exceeds available work but it in fact needs to grab its own p_lock and call lwp_exit(), lest the LWP state associated with the process become stale.
Updated by Garrett D'Amore over 12 years ago
- Project changed from site to illumos gate
Updated by Garrett D'Amore over 12 years ago
- File kcf-fix.patch kcf-fix.patch added
- File cryptotest.c cryptotest.c added
See the attachment for the fix, and for a test program to exercise kcf. (The test program is really a kernel module... just compile it and do modload ./cryptotest -- it will fire off a large number -- 1000 in this case -- of kernel jobs which submit crypto jobs for processing.)
Note that to monitor kcf, you have to be root to see kcf stats.
Updated by Garrett D'Amore over 12 years ago
- Category set to kernel
- Status changed from New to Resolved
- % Done changed from 0 to 100
- Estimated time set to 2.00 h
Resolved in:
changeset: 13317:bd2d2a5ed3e4
tag: tip
user: Garrett D'Amore <garrett@nexenta.com>
date: Sun Apr 03 07:44:01 2011 -0700
description:
875 missing lwp_exit() in kcfpool_svc() induces panic in prchoose()
Reviewed by: Bryan Cantrill <bryancantrill@gmail.com>
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Approved by: Gordon Ross <gwr@nexenta.com>