Project

General

Profile

Bug #11048

need a way to disable SMT

Added by Jerry Jelinek 3 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
2019-05-17
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

This is for upstreaming work John Levon has already done in SmartOS. I'm including some notes from our internal bug report.

For various reasons, we need a way to effectively disable hyper-threading in SmartOS and Triton, in a managable way. The exact details need thrashing out, but the basic idea is that during early(ish) boot, some configuration item (possibly a kernel cmdline one, possibly in the USB key's config.inc/) will trigger a call to the kernel to offline all the sibling CPUs. They will no longer be available for scheduling of processes, including HVM processes.

A non-exhaustive list of things to think about here:

- we should not allow them to be onlined again (perhaps only with a force option)?
- earlier during boot we may have made allocations on the presumption of the siblings being online (as they are at that point). In particular, we may be allocating too many interrupts for the actual system execution
- verify that MCA/MCE/NMIs are still functional - they can arrive on the "bad" sibling? Is there an implication with removing the mca poll omni cyclic on the offline CPU?
- even an offline CPU (which is still CPU_READY) takes cross-calls, for example dtrace_xcall(). We need to evaluate all of these and figure out if such x-calls are a concern, and what to do about them if so. Perhaps a CPU_DISABLED cpu can opt out of certain x-call actions
- if we should have a new processor_info() / psradm / psrinfo state for such siblings outside of just P_OFFLINE as visible to userspace. what compat implications are there with this? Or do we just report HT state separately?
- do we want a way to report whether we consider HT to be safe on this system (for future h/w)

History

#1

Updated by Jerry Jelinek 3 months ago

Testing notes from John's internal testing at Joyent:

smartos defaults to SMT on
smartos can boot with SMT on or off if modified

sdc-usbkey set-variable smt_enabled fails for grub
sdc-usbkey get-variable fails for grub

CN: existing CN, post upgrade to these bits, will not get an smt_enabled property, SMT remains enabled
upgraded HN does not have SMT disabled

CN: a new CN (factory reset, new PI) will get it, defaulting to SAPI contents.
If smt disabled, CPUs will be disabled, sysinfo looks right
sdc-server set <uuid> smt_enabled false for CN
sdc-server set <uuid> smt_enabled true for CN
sdc-server get <uuid> smt_enabled with something set for CN

NB these only work once a server is setup
sdc-server get smt_enabled with nothing set in SAPI, should return true
sdc-server get smt_enabled with something set in SAPI
sdc-server get foo fails
sdc-server set foo bar fails
sdc-server set smt_enabled false
sdc-server set smt_enabled true
sdc-server set smt_enabled truefalse fails

sdc-server get <uuid> foo fails
sdc-server set <uuid> foo bar fails
sdc-server get <nosuchuuid> smt_enabled fails
sdc-server set <nosuchuuid> smt_enabled false fails

sdc-server get <uuid> smt_enabled with nothing set for CN, should return original SAPI value

sdc-server get <uuid> smt_enabled HN with nothing set in key, should return error
sdc-server get <uuid> smt_enabled with something set for HN
sdc-server set <uuid> smt_enabled false for HN
sdc-server set <uuid> smt_enabled true for HN

sdc-usbkey get-variable works for loader.conf.local
.local over-rides .conf
missing .local file is OK
sdc-usbkey get-variable fails
sdc-usbkey get-variable nosuch works OK
sdc-usbkey get-variable smt_enabled works OK
sdc-usbkey set-variable foo bar works OK, for new and existing properties

verify adminUI looks OK
reflashed HN has SMT enabled
check dapi/cnapi with non-updated CN (old PI)
check adminUI with non-updated CN (old PI)
test of bhyve/kvm
test of lx
run cnapi test suite
run dapi test suite
run schemas precheck
test dapi capacity.js and check.js

cannot force online boot disabled CPU
can force online otherwise
If forced, smt_enabled is now true
verify psradm -Ha
verify psradm -Ha fails if emptying partition
sanity check cpustat
verify psrinfo -c, -S, -p, -t, -r
dtrace still works OK with firing on disabled CPUs
::cpuinfo -v in MDB shows disabled CPUs OK

#2

Updated by Electric Monk 3 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit c3377ee9a5b3bff76dbf51347a8de3d215eb6cca

commit  c3377ee9a5b3bff76dbf51347a8de3d215eb6cca
Author: John Levon <john.levon@joyent.com>
Date:   2019-05-22T14:57:24.000Z

    11048 need a way to disable SMT
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
    Reviewed by: Andy Fiddaman <omnios@citrus-it.co.uk>
    Reviewed by: Gergo Doma <domag02@gmail.com>
    Approved by: Gordon Ross <gwr@nexenta.com>

Also available in: Atom PDF