Project

General

Profile

Feature #12967

default to apix over pcplusmp

Added by Robert Mustacchi 24 days ago. Updated 24 days ago.

Status:
New
Priority:
Normal
Category:
driver - device drivers
Start date:
Due date:
% Done:

50%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

On the x86 side of the house, illumos has traditionally had a series of three different PSM (platform specific modules). These are:

  • apix
  • pcplusmp
  • uppc

Today, most systems use either the apix or pcplusmp module. The uppc module is a holdover from uni-processor systtems and doesn't assume that a local APIC exists! There are two major differences between the apix and pcplusmp PSM modules. The first major difference relates to how they communicate to the APIC itself. The local apic was originally talked to using a specific range of memory-mapped I/O (MMIO). Eventually Intel introduced the x2apic, which notably changed from MMIO to using MSRs and increased the number of id bits from 8 to 32. This is required if you have more than 256 processors and you'd like to be able to address them all.

The apix module is configured to be able to handle both the MSR and the MMIO modes of the apic. The pcplusmp module is an older module and it only handles the MMIO mode of the apic, assuming that the apix module will handle the other mode, despite actually primarily using the same code to talk to it.

There is another major difference between the psm modules. They are currently used to drive interrupt policy on i86pc systems. The pcplusmp module does not support per-CPU interrupts, which is a rather limiting factor. The apix module does support per-CPU interrupts, and while its policies around interrupt quantities could be better, it is a much better choice for most users than tthe fixed 2 interrupts per driver. The pcplusmp system also has the challenge that on some platforms you easily run out of interrupts with sufficient devices (a problem that plagued some fishworks appliances back in the day and required some specific rejiggering).

The x2apic was introduced sometime in the Nehalem-Sandy Bridge time frame and on x86 bioses ended up switching to enabling it by default at some point in the intervening generations and opting to expose its existence in cpuid. However, AMD only introduced the x2apic with their Rome series of processors (and perhaps in the Ryzen equivalent). But even though they did, many BIOSes still don't actually enable it by default (at least using this AsRock Rack Rome board as an example). This means that all AMD Zen systems are stuck in a world with a small number of interrupts, unlike their Intel counterparts.

As a result of this I believe we should default to the apix module as this gives us a number of benefits:
  • It enables more interrupts on systems, which should generally help. And importantly doesn't map all interrupts to all CPUs.
  • If x2apic mode is available and it opts to use it, it can result in better virtualization performance when APIC virtualization isn't present as MSR exits don't require instruction decoing and in theory it's easier to deal with IPIs.

The big challenge here is testing. We do have good experience of running with the apix mode in smaller CPU systems due to the fact that this commonly shows up when virtualized on different platforms under QEMU/KVM. This will coordinate testing with the broader community to test this. For each of the systems that we're going to look at there are a few things for us to consider and evaluate:

1. Whether or not they're using apix or pcplusmp. The easiest way is to run mdb -ke 'setspl/p'. It will be apix_setspl if they are already.
2. If they're using the apix module then a secondary question for future work is whether they're using the x2apic mode or the local apic mode, which can be determined by running mdb -ke 'apic_reg_ops::print'.

While the linked cr has a patch that ensures its always enabled, we can also easily test this today in systems without recompiling by running set apix:apix_hw_chk_enable=0 in /etc/system. As we get test data and results in, we'll update a running list in this ticket.

History

#1

Updated by Electric Monk 24 days ago

  • Gerrit CR set to 801

Also available in: Atom PDF