default to apix over pcplusmp
On the x86 side of the house, illumos has traditionally had a series of three different PSM (platform specific modules). These are:
Today, most systems use either the apix or pcplusmp module. The uppc module is a holdover from uni-processor systtems and doesn't assume that a local APIC exists! There are two major differences between the apix and pcplusmp PSM modules. The first major difference relates to how they communicate to the APIC itself. The local apic was originally talked to using a specific range of memory-mapped I/O (MMIO). Eventually Intel introduced the x2apic, which notably changed from MMIO to using MSRs and increased the number of id bits from 8 to 32. This is required if you have more than 256 processors and you'd like to be able to address them all.
The apix module is configured to be able to handle both the MSR and the MMIO modes of the apic. The pcplusmp module is an older module and it only handles the MMIO mode of the apic, assuming that the apix module will handle the other mode, despite actually primarily using the same code to talk to it.
There is another major difference between the psm modules. They are currently used to drive interrupt policy on i86pc systems. The pcplusmp module does not support per-CPU interrupts, which is a rather limiting factor. The apix module does support per-CPU interrupts, and while its policies around interrupt quantities could be better, it is a much better choice for most users than tthe fixed 2 interrupts per driver. The pcplusmp system also has the challenge that on some platforms you easily run out of interrupts with sufficient devices (a problem that plagued some fishworks appliances back in the day and required some specific rejiggering).
The x2apic was introduced sometime in the Nehalem-Sandy Bridge time frame and on x86 bioses ended up switching to enabling it by default at some point in the intervening generations and opting to expose its existence in cpuid. However, AMD only introduced the x2apic with their Rome series of processors (and perhaps in the Ryzen equivalent). But even though they did, many BIOSes still don't actually enable it by default (at least using this AsRock Rack Rome board as an example). This means that all AMD Zen systems are stuck in a world with a small number of interrupts, unlike their Intel counterparts.As a result of this I believe we should default to the apix module as this gives us a number of benefits:
- It enables more interrupts on systems, which should generally help. And importantly doesn't map all interrupts to all CPUs.
- If x2apic mode is available and it opts to use it, it can result in better virtualization performance when APIC virtualization isn't present as MSR exits don't require instruction decoing and in theory it's easier to deal with IPIs.
The big challenge here is testing. We do have good experience of running with the apix mode in smaller CPU systems due to the fact that this commonly shows up when virtualized on different platforms under QEMU/KVM. This will coordinate testing with the broader community to test this. For each of the systems that we're going to look at there are a few things for us to consider and evaluate:
1. Whether or not they're using apix or pcplusmp. The easiest way is to run
mdb -ke 'setspl/p'. It will be
apix_setspl if they are already.
2. If they're using the apix module then a secondary question for future work is whether they're using the x2apic mode or the local apic mode, which can be determined by running
mdb -ke 'apic_reg_ops::print'.
While the linked cr has a patch that ensures its always enabled, we can also easily test this today in systems without recompiling by running
set apix:apix_hw_chk_enable=0 in /etc/system. As we get test data and results in, we'll update a running list in this ticket.
Updated by Robert Mustacchi over 1 year ago
To test this I've worked with the broader community and done a few different classes of testing. The first set is on my own AMD Rome based system. In particular, I went ahead and have been running with the actual proposed change on it for some time. I've been using the system for illumos builds and general development. In particular, I also went and compared approximate build time with and without the change and didn't see any appreciable change.
The next phase of this was to disable the hardware check and asked folks in the community to test and see what happened to their systems and make sure that everything came up as we expected. The following table indicates the motherboard, CPU, and if anything changed. For a number of Intel based systems, we didn't expect any change here because they have had the x2apic for some time.
|Ryzen 3700X||AsRock Rack X470D4U||yes|
|Ryzen 3 200G||Asus Prime B350M-A||yes|
|i3-9100||Asus Prime B360M-A||no|
|Xeon W3550||HP Z400||yes|
|E5-2650 v3||Dell R730||no|
|E5410||Dell PowerEdge 1950||yes|
|Turion II Neo||N54L||yes|
|E5-2676 v3||Xen||There was no change from apic here, because this platform does not support the apix.|
Based on the above and a couple of additional reports about additional systems, we have somewhat what we'd expect. Starting with Sandy Bridge based Intel systems, we're pretty much always on this and we see no change. Therefore changing this default doesn't cause much risk there. With older Intel systems we see that we've been able to have this take effect and haven't observed anecdotal problems. The major beneficiary are various AMD systems. Here we've done a range of testing on older generation parts and various Zen parts. In addition, the Zen 2 parts have received additional testing and regular use for over a month from me.
The apix force disable is still valid (setting apix_enable to zero), so any users who had already set that in /etc/system will no be forced on to it. All in all, this should result in a relatively acceptable and hopefully low risk for most users.
Updated by Electric Monk over 1 year ago
- Status changed from New to Closed
- % Done changed from 50 to 100
commit eee178623516e461188f137dc6e5e38875967e09 Author: Robert Mustacchi <email@example.com> Date: 2020-09-14T14:46:55.000Z 12967 default to apix over pcplusmp Reviewed by: Patrick Mooney <firstname.lastname@example.org> Reviewed by: Paul Winder <email@example.com> Approved by: Joshua M. Clulow <firstname.lastname@example.org>
Updated by Robert Bailey over 1 year ago
This commit is causing my system to lock up. Sometimes it locks up in a couple of minutes and sometimes it takes a couple of hours. If I boot to a snapshot before this commit the system is completely stable.
Is there a configuration that will disable this?
Here is the output before it was changed.
root@openindiana:/var/adm# mdb -ke 'setspl/p'
Running Openindiana updated within the last few days.
System Configuration: Dell Inc. Precision WorkStation T5500
BIOS Configuration: Dell Inc. A18 10/15/2018
==== Processor Sockets ====================================
Version Location Tag
Intel Xeon E5530 CPU1
Intel Xeon E5530 CPU2
Updated by Robert Mustacchi over 1 year ago
Sorry to hear about the trouble it's causing on that system. You should be able to disable this by setting
apix_enable to zero in
/etc/system. If you're able to get a dump from the system when it's hung or there's more information you can provide, we can try to understand what's going on.
Updated by Robert Bailey over 1 year ago
So after most of the day with no system lockups I re enabled apix so that I could get a crash dump. The system locked up and I have a crash dump. Here is some output from mdb.
root@openindiana:/var/crash/openindiana# mdb 0
mdb: warning: dump is from SunOS 5.11 illumos-e41ba543c9; dcmds and macros may not match kernel implementation
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs ip hook neti sockfs arp usba uhci smbios fctl stmf stmf_sbd mm lofs idm random sd nfs sppp crypto smbsrv ipc ptm cpc fcip fcp ufs logindmux nsmb kvm ]
debugging crash dump vmcore.0 (64-bit) from openindiana
operating system: 5.11 illumos-e41ba543c9 (i86pc)
image uuid: 1ef23b18-8c2f-e1d2-9720-be2da6d45eb3
panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff0047589530 addr=24 occurred in module "nvidia" due to a NULL pointer dereference
dump content: kernel pages only
3ff4f000 [no mapping for address]
So it may be a problem with apix and nvidia?