Project

General

Profile

Actions

Feature #12967

closed

default to apix over pcplusmp

Added by Robert Mustacchi 10 months ago. Updated 7 months ago.

Status:
Closed
Priority:
Normal
Category:
driver - device drivers
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

On the x86 side of the house, illumos has traditionally had a series of three different PSM (platform specific modules). These are:

  • apix
  • pcplusmp
  • uppc

Today, most systems use either the apix or pcplusmp module. The uppc module is a holdover from uni-processor systtems and doesn't assume that a local APIC exists! There are two major differences between the apix and pcplusmp PSM modules. The first major difference relates to how they communicate to the APIC itself. The local apic was originally talked to using a specific range of memory-mapped I/O (MMIO). Eventually Intel introduced the x2apic, which notably changed from MMIO to using MSRs and increased the number of id bits from 8 to 32. This is required if you have more than 256 processors and you'd like to be able to address them all.

The apix module is configured to be able to handle both the MSR and the MMIO modes of the apic. The pcplusmp module is an older module and it only handles the MMIO mode of the apic, assuming that the apix module will handle the other mode, despite actually primarily using the same code to talk to it.

There is another major difference between the psm modules. They are currently used to drive interrupt policy on i86pc systems. The pcplusmp module does not support per-CPU interrupts, which is a rather limiting factor. The apix module does support per-CPU interrupts, and while its policies around interrupt quantities could be better, it is a much better choice for most users than tthe fixed 2 interrupts per driver. The pcplusmp system also has the challenge that on some platforms you easily run out of interrupts with sufficient devices (a problem that plagued some fishworks appliances back in the day and required some specific rejiggering).

The x2apic was introduced sometime in the Nehalem-Sandy Bridge time frame and on x86 bioses ended up switching to enabling it by default at some point in the intervening generations and opting to expose its existence in cpuid. However, AMD only introduced the x2apic with their Rome series of processors (and perhaps in the Ryzen equivalent). But even though they did, many BIOSes still don't actually enable it by default (at least using this AsRock Rack Rome board as an example). This means that all AMD Zen systems are stuck in a world with a small number of interrupts, unlike their Intel counterparts.

As a result of this I believe we should default to the apix module as this gives us a number of benefits:
  • It enables more interrupts on systems, which should generally help. And importantly doesn't map all interrupts to all CPUs.
  • If x2apic mode is available and it opts to use it, it can result in better virtualization performance when APIC virtualization isn't present as MSR exits don't require instruction decoing and in theory it's easier to deal with IPIs.

The big challenge here is testing. We do have good experience of running with the apix mode in smaller CPU systems due to the fact that this commonly shows up when virtualized on different platforms under QEMU/KVM. This will coordinate testing with the broader community to test this. For each of the systems that we're going to look at there are a few things for us to consider and evaluate:

1. Whether or not they're using apix or pcplusmp. The easiest way is to run mdb -ke 'setspl/p'. It will be apix_setspl if they are already.
2. If they're using the apix module then a secondary question for future work is whether they're using the x2apic mode or the local apic mode, which can be determined by running mdb -ke 'apic_reg_ops::print'.

While the linked cr has a patch that ensures its always enabled, we can also easily test this today in systems without recompiling by running set apix:apix_hw_chk_enable=0 in /etc/system. As we get test data and results in, we'll update a running list in this ticket.

Actions #1

Updated by Electric Monk 10 months ago

  • Gerrit CR set to 801
Actions #2

Updated by Robert Mustacchi 8 months ago

To test this I've worked with the broader community and done a few different classes of testing. The first set is on my own AMD Rome based system. In particular, I went ahead and have been running with the actual proposed change on it for some time. I've been using the system for illumos builds and general development. In particular, I also went and compared approximate build time with and without the change and didn't see any appreciable change.
The next phase of this was to disable the hardware check and asked folks in the community to test and see what happened to their systems and make sure that everything came up as we expected. The following table indicates the motherboard, CPU, and if anything changed. For a number of Intel based systems, we didn't expect any change here because they have had the x2apic for some time.

CPU MOBO Change
E3-1246v3 X10SAE no
i9-10900x VMware no
i5-9400F - no
Ryzen 3700X AsRock Rack X470D4U yes
Ryzen 3 200G Asus Prime B350M-A yes
i3-9100 Asus Prime B360M-A no
Xeon W3550 HP Z400 yes
Xeon D-1541 X10SDV-TLN4F no
Atom C2750 A1SAi-2750F yes
E5-2650 v3 Dell R730 no
E31260L Dell R210 no
E5410 Dell PowerEdge 1950 yes
E3-1240v2 HP Z220 no
Turion II Neo N54L yes
i5-3350P ASUSTeK no
X3450 X8SIL yes
i7-4790 Optiplex 9020 no
E5-2676 v3 Xen There was no change from apic here, because this platform does not support the apix.

Based on the above and a couple of additional reports about additional systems, we have somewhat what we'd expect. Starting with Sandy Bridge based Intel systems, we're pretty much always on this and we see no change. Therefore changing this default doesn't cause much risk there. With older Intel systems we see that we've been able to have this take effect and haven't observed anecdotal problems. The major beneficiary are various AMD systems. Here we've done a range of testing on older generation parts and various Zen parts. In addition, the Zen 2 parts have received additional testing and regular use for over a month from me.

The apix force disable is still valid (setting apix_enable to zero), so any users who had already set that in /etc/system will no be forced on to it. All in all, this should result in a relatively acceptable and hopefully low risk for most users.

Actions #3

Updated by Electric Monk 8 months ago

  • Status changed from New to Closed
  • % Done changed from 50 to 100

git commit eee178623516e461188f137dc6e5e38875967e09

commit  eee178623516e461188f137dc6e5e38875967e09
Author: Robert Mustacchi <rm@fingolfin.org>
Date:   2020-09-14T14:46:55.000Z

    12967 default to apix over pcplusmp
    Reviewed by: Patrick Mooney <pmooney@pfmooney.com>
    Reviewed by: Paul Winder <paul@winder.uk.net>
    Approved by: Joshua M. Clulow <josh@sysmgr.org>

Actions #4

Updated by Robert Bailey 7 months ago

This commit is causing my system to lock up. Sometimes it locks up in a couple of minutes and sometimes it takes a couple of hours. If I boot to a snapshot before this commit the system is completely stable.
Is there a configuration that will disable this?

Here is the output before it was changed.
root@openindiana:/var/adm# mdb -ke 'setspl/p'
setspl:
setspl: apic_setspl

Running Openindiana updated within the last few days.

root@openindiana:/var/adm# prtdiag
System Configuration: Dell Inc. Precision WorkStation T5500
BIOS Configuration: Dell Inc. A18 10/15/2018

==== Processor Sockets ====================================

Version Location Tag
-------------------------------- --------------------------
Intel Xeon E5530 CPU1
Intel Xeon E5530 CPU2

Thank You

Actions #5

Updated by Robert Mustacchi 7 months ago

Sorry to hear about the trouble it's causing on that system. You should be able to disable this by setting apix_enable to zero in /etc/system. If you're able to get a dump from the system when it's hung or there's more information you can provide, we can try to understand what's going on.

Actions #6

Updated by Robert Bailey 7 months ago

After making the configuration change in /etc/system "set apix_enable=0" I am not getting any more system lock ups. It usually happens within a few minutes.
How would I be able to get a crash dump for you? I would like to help.

Thank You

Actions #7

Updated by Robert Bailey 7 months ago

So after most of the day with no system lockups I re enabled apix so that I could get a crash dump. The system locked up and I have a crash dump. Here is some output from mdb.
root@openindiana:/var/crash/openindiana# mdb 0
mdb: warning: dump is from SunOS 5.11 illumos-e41ba543c9; dcmds and macros may not match kernel implementation
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs ip hook neti sockfs arp usba uhci smbios fctl stmf stmf_sbd mm lofs idm random sd nfs sppp crypto smbsrv ipc ptm cpc fcip fcp ufs logindmux nsmb kvm ]

::status

debugging crash dump vmcore.0 (64-bit) from openindiana
operating system: 5.11 illumos-e41ba543c9 (i86pc)
image uuid: 1ef23b18-8c2f-e1d2-9720-be2da6d45eb3
panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff0047589530 addr=24 occurred in module "nvidia" due to a NULL pointer dereference
dump content: kernel pages only

$C

ffffff0eaadedd08 _nv002170rm+0x66()
3ff4f000 [no mapping for address]

So it may be a problem with apix and nvidia?

Actions

Also available in: Atom PDF