Project

General

Profile

Actions

Bug #11787

closed

Kernel needs to be built with retpolines

Added by John Levon over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

http://smartos.org/bugview/OS-7598

To mitigate the impact of SpectreV2 we need to build the operating system with retpolines and make sure that we properly change them and fallback when enhanced IBRS is present as well as properly enable the AMD variant.


Related issues

Related to illumos gate - Bug #11788: Kernel needs to generally use RSB stuffingClosedJohn Levon

Actions
Actions #1

Updated by John Levon over 3 years ago

Work originally by Robert Mustacchi.

Note that as well as just enablign the compiler-generated retpoline thunks, we need to fix up all the assembly pieces we have to call the retpolines as needed.

We haven't done this for KMDB as it's hard to see an attack vector, and it's not really worth it.

Actions #2

Updated by Joshua M. Clulow over 3 years ago

  • Related to Bug #11788: Kernel needs to generally use RSB stuffing added
Actions #3

Updated by John Levon over 3 years ago

Aside from the SmartOS testing, I boot-tested these bits successfully, and confirmed via "dis" that we don't have any unexpected calls/jumps left that don't use retpolines.

Actions #4

Updated by Joshua M. Clulow over 3 years ago

Testing Notes (from Robert in SmartOS issue)

I've tested this in a number of different ways:

OS Unit Tests

First and foremost I ran through the following OS unit tests. Notes on them are below:

  • bhyvetest (100% pass)
  • crypto-test (100% pass)
  • elf-tests (100% pass)
  • libc-tests (100% pass)
  • net-testes (100% pass)
  • os-tests
    • badseg killed
    • acquire-compare is flakey -- OS-7926
    • secflags_core (does usually happen)
    • secflags_elfdump (does usually happen)
    • secflags_zonecfg (does usually happen)
  • vndtest (100% pass)
  • util-tests (100% pass)
  • awk requires OS-7927
  • grep_test fails if run as ./test-runner/bin/run -c ./util-tests/runfiles/default.run,
    but works if /opt/util-tets/tests/grep_test
  • zfs-tests
    • hit issues running the tests that we see on other platforms; however a large number of them passed and in older runs they were clean.
  • dtest
    • usual failures; performed both on debug and non-debug as safety tests sometimes time out on debug

Basic zone usage

As part of doing other testing, I ran the following different types of VMs. I made sure they all booted and were able to do different amounts of activity:

  • KVM
  • BHYVE (AMD and Intel)
  • SmartOS
  • LX

Triton

I ran through basic headnode setup and made sure that all services were online and that no cores were generated. In addition, John Levon did some basic testing of this. We've also had QA run through an earlier version of this change.

Retpoline Variants

I went through and verified that we had the correct versions of retpolines enabled in the following different scenarios:

  • An Intel system without EIBRS used the general form on both hardware and inside of a VMware VM
  • An Intel system with EIBRS (Cascade Lake) enabled EIBRS and used the particular form there
  • An AMD physical system used the AMD optimized form
  • A bhyve VM running SmartOS on an AMD EPYC system used the general form because it could not toggle the AMD optimized MSR
  • A Linux KVM VM on AMD running SmartOS used the AMD optimized form because Linux KVM virtualized the requisite MSR

disasm test

At John Levon 's suggestion I went through and did a disasm test. From the proto area of a SmartOS image I did the following:

$ for f in $(find kernel/ usr/kernel/ platform/i86* -type f); do if dis $f 2>/dev/null | grep \*; then echo $f; fi; done  > dis.out

This generated everywhere in the disasm that we had anything that looked like an indirect call. This found a missing case in modstubs and then I went back and audited the rest of the locations. They were broken down into a few cases:

  • closed bins
  • kmdb
  • illumos-extra (tun/tap)
  • kvm
  • A handful of crypto modules with disasm errors

Manually going through a number of these led to the conclusion that things were fine or as good as we could get them. We will follow up with the kvm and illumos-extra in subsequent work. These are OS-7945 and OS-7946

Boot spot-testing on various platforms.

I just did basic regression testing booting on the following different platforms:

  • haswell (1s Haswell E3)
  • ivy (1s Ivy Bridge E3)
  • estel (Kaby Lake NUC)
  • vilya (Skylake NUC)
  • nirnaeth (Coffee Lake desktop)
  • odyssey (AMD EPYC Naples 2s)
  • coal (Ivy Bridge laptop)
Actions #5

Updated by Electric Monk over 3 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 65f204200cf9a50fd6bad4093ee0b07bc35105ac

commit  65f204200cf9a50fd6bad4093ee0b07bc35105ac
Author: Robert Mustacchi <rm@joyent.com>
Date:   2019-10-22T09:03:00.000Z

    11787 Kernel needs to be built with retpolines
    11788 Kernel needs to generally use RSB stuffing
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: John Levon <john.levon@joyent.com>
    Reviewed by: Paul Winder <paul@winders.demon.co.uk>
    Reviewed by: Andy Fiddaman <andy@omniosce.org>
    Approved by: Joshua M. Clulow <josh@sysmgr.org>

Actions

Also available in: Atom PDF