Project

General

Profile

Bug #12336

Boot hang caused by x2apic probe using incorrect local apic id

Added by Paul Winder about 1 month ago. Updated 30 days ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

During apix initialisation, the local apic id of the boot CPU is assigned to CPU 0.

Initialisation reads the local apic id register and extracts the local apic id from it, then parses the MADT tables, assigning ids to CPUs starting from CPU 1. When it finds the id which matches the local id of the boot CPU this is assigned as CPU 0.

The reading of the initial local apic id assumes the id is in the high order bype of the local id register. This is only true for apic (where the id is on 8 bits long), for x2apic the id is the full 32 bits of the register.

So, in x2apic, if the local apic id of the boot cpu is not zero, the system will hang.

History

#1

Updated by Paul Winder about 1 month ago

  • Description updated (diff)
#2

Updated by Paul Winder about 1 month ago

The issue was found with these "Gold" CPUs

==== Processor Sockets ====================================

Version                          Location Tag
-------------------------------- --------------------------
Intel(R) Xeon(R) Gold 6244 CPU @ 3.60GHz Proc 1
Intel(R) Xeon(R) Gold 6244 CPU @ 3.60GHz Proc 2

#3

Updated by Paul Winder about 1 month ago

Reviews is: https://code.illumos.org/c/illumos-gate/+/387

Tested by booting with the x2APIC enabled and disabled in the BIOS. Before the fix the boot would always hang.

After boot, used mdb to verify the apic_mode variable is correctly set.

Extra debug in the failure case (x2APIC):

APIC_LID_REG 0x10
apix: x2apic 0 4 0x4
apix: x2apic 1 8 0x8
apix: x2apic 2 16 0x10
apix: x2apic 3 22 0x16
apix: x2apic 4 34 0x22
.
.

The APIC_LID_REG shows the local apic Id of the boot CPU is 0x10 (as a 32 bit value). Which corresponds to the 2nd (relative to 0), entry in the MADT table. This is the value which needs to be associated with CPU 0. Before the change, it attempted to match bits 24-31 (value 0) which never succeeded.

When x2APIC was disabled, the APIC_LID_REG was 0x10000000.

#4

Updated by Electric Monk 30 days ago

  • Status changed from In Progress to Closed
  • % Done changed from 90 to 100

git commit 2d990ab13b25283780fb663abff2a2bc92888ed0

commit  2d990ab13b25283780fb663abff2a2bc92888ed0
Author: Paul Winder <paul@winders.demon.co.uk>
Date:   2020-02-28T16:35:07.000Z

    12336 Boot hang caused by x2apic probe using incorrect local apic id
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Reviewed by: Andy Stormont <astormont@racktopsystems.com>
    Reviewed by: Robert Mustacchi <rm@fingolfin.org>
    Approved by: Gordon Ross <gordon.w.ross@gmail.com>

Also available in: Atom PDF