Bug #12336
closedBoot hang caused by x2apic probe using incorrect local apic id
100%
Description
During apix initialisation, the local apic id of the boot CPU is assigned to CPU 0.
Initialisation reads the local apic id register and extracts the local apic id from it, then parses the MADT tables, assigning ids to CPUs starting from CPU 1. When it finds the id which matches the local id of the boot CPU this is assigned as CPU 0.
The reading of the initial local apic id assumes the id is in the high order bype of the local id register. This is only true for apic (where the id is on 8 bits long), for x2apic the id is the full 32 bits of the register.
So, in x2apic, if the local apic id of the boot cpu is not zero, the system will hang.
Updated by Paul Winder over 2 years ago
The issue was found with these "Gold" CPUs
==== Processor Sockets ==================================== Version Location Tag -------------------------------- -------------------------- Intel(R) Xeon(R) Gold 6244 CPU @ 3.60GHz Proc 1 Intel(R) Xeon(R) Gold 6244 CPU @ 3.60GHz Proc 2
Updated by Paul Winder over 2 years ago
Reviews is: https://code.illumos.org/c/illumos-gate/+/387
Tested by booting with the x2APIC enabled and disabled in the BIOS. Before the fix the boot would always hang.
After boot, used mdb to verify the apic_mode variable is correctly set.
Extra debug in the failure case (x2APIC):
APIC_LID_REG 0x10 apix: x2apic 0 4 0x4 apix: x2apic 1 8 0x8 apix: x2apic 2 16 0x10 apix: x2apic 3 22 0x16 apix: x2apic 4 34 0x22 . .
The APIC_LID_REG shows the local apic Id of the boot CPU is 0x10 (as a 32 bit value). Which corresponds to the 2nd (relative to 0), entry in the MADT table. This is the value which needs to be associated with CPU 0. Before the change, it attempted to match bits 24-31 (value 0) which never succeeded.
When x2APIC was disabled, the APIC_LID_REG was 0x10000000.
Updated by Electric Monk over 2 years ago
- Status changed from In Progress to Closed
- % Done changed from 90 to 100
git commit 2d990ab13b25283780fb663abff2a2bc92888ed0
commit 2d990ab13b25283780fb663abff2a2bc92888ed0 Author: Paul Winder <paul@winders.demon.co.uk> Date: 2020-02-28T16:35:07.000Z 12336 Boot hang caused by x2apic probe using incorrect local apic id Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Andy Stormont <astormont@racktopsystems.com> Reviewed by: Robert Mustacchi <rm@fingolfin.org> Approved by: Gordon Ross <gordon.w.ross@gmail.com>