Project

General

Profile

Bug #11909

THREAD_KPRI_RELEASE does nothing of the sort

Added by John Levon 20 days ago. Updated 20 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:

History

#1

Updated by John Levon 20 days ago

Work by Patrick Mooney originally at https://smartos.org/bugview/OS-7753 (and OS-7926)

Some testing notes from that below. In addition to below, I've run these bits for a while during illumos-gate build.

On several occasions, most recently in SWSUP-1426, systems have been observed in dire or deadlocked states thanks to threads running with MINCLSYSPRI acquired via THREAD_KPRI_REQUEST. With such a thread blocks on a lock with nonzero t_kpri_req, FSS and TSS will set a flag (FSSKPRI and TSSKPRI, respectively) which will persist in the scheduler struct until it is cleared in fss_trapret when the thread returns to userspace. If the thread spends a lot of time in userspace, as is common for KVM, bhyve, or workloads like sendfile(), then FSSKPRI will remain asserted, causing that thread to bypass many scheduler decisions, including expiration of its time quantum so other threads can preempt it.

This behavior has dubious origins, and in the world of fully preemptable kernel logic running on large multi-CPU systems, it seems ideal to rip it out.

With a stock PI and one with my change to rip out the KPRI logic, I ran full illumos-joyent nightly builds and compared the performance. When run repeatedly, the build duration was consistent within each PI, but the patched version was faster by a significant margin. Here are two such runs:

Stock:

==== Elapsed build time (non-DEBUG) ====

real 14:01.1
user 1:50:14.7
sys 55:59.8

Patched:

==== Elapsed build time (non-DEBUG) ====

real 10:37.6
user 1:52:30.6
sys 46:22.0

Between each of the builds, I clobbered much of the workspace (leaving the bootstrap tools intact) and then initiated the build after a reboot so ARC warmth would not be a factor.

#2

Updated by Gordon Ross 20 days ago

The design work for this really should consider its effect on https://en.wikipedia.org/wiki/Priority_inversion problems and explain how the result will avoid that problem.

Also available in: Atom PDF