Bug #6679
closedUseless assert in acpidev_cpu_query_MAT()
100%
Description
Broadwell-DE (Xeon D-1520) Supermicro box panics in acpidev with an assert when the "X2APIC" option is enabled in the BIOS. Turning the X2APIC to off allows a uneventful and otherwise normal boot.
NOTICE: acpica: attaching SCI 9 NOTICE: acpica: ECDT not found WARNING: acpica:ec_probe_ns: PNP0C09 not found Completing Region/Field/Buffer/Package initialization:.......................... ............................................................................... ................................................................................ ............................................................................... ................................................................................ ............................................................................... ................................................................................ ............................................................................... ................................................................................ ............................................................................... .............................. Initialized 81/83 Regions 20/20 Fields 46/46 Buffers 624/624 Packages (5509 nodes) Initializing Device/Processor/Thermal objects by executing _INI methods:................. Executed 17 _INI methods requiring 2 _STA executions (examined 381 objects) evgpeblk-0541 [4294967292] EvInitializeGpeBlock : Enabled 3 GPEs in this block panic[cpu0]/thread=fffffffffbc3f320: assertion failed: mpa->Id != 255, file: ../../i86pc/io/acpi/acpidev/acpidev_cpu.c, line: 255 Warning - stack not written to the dump buffer fffffffffbc812a0 genunix:process_type+17e160 () fffffffffbc812d0 acpidev:acpidev_cpu_query_MAT+128 () fffffffffbc81360 acpidev:acpidev_walk_apic+ce ()
mpa-Id info at time of panic:
[0]> 0xffffff11e49b0260::print -d ACPI_MADT_LOCAL_APIC { Header = { Type = 0 Length = 8 } ProcessorId = 0t255 Id = 0t255 LapicFlags = 0 }
and the _MAT table:
[0]> 0xffffff11e49b3e88::print -t ACPI_DEVICE_INFO ACPI_DEVICE_INFO { UINT32 InfoSize = 0x73 UINT32 Name = 0x30315043 ACPI_OBJECT_TYPE Type = 0xc UINT8 ParamCount = 0 UINT8 Valid = 0xd UINT8 Flags = 0 UINT8 [4] HighestDstates = [ 0xff, 0xff, 0xff, 0xff ] UINT8 [5] LowestDstates = [ 0xff, 0xff, 0xff, 0xff, 0xff ] UINT32 CurrentStatus = 0 UINT64 Address = 0 ACPI_DEVICE_ID HardwareId = { UINT32 Length = 0x9 char *String = 0xffffff11e49b3ed8 "ACPI0007" } ACPI_DEVICE_ID UniqueId = { UINT32 Length = 0xa char *String = 0xffffff11e49b3ee1 "SCK0-CP10" } ACPI_DEVICE_ID_LIST CompatibleIdList = { UINT32 Count = 0 UINT32 ListSize = 0 ACPI_DEVICE_ID [1] Ids = [ ACPI_DEVICE_ID { UINT32 Length = 0x49504341 char *String = 0x50432d304b435300 }, ] } }
Updated by Dan McDonald about 7 years ago
Post #6807, the line number has changed, but I'm still seeing it panic with a DEBUG kernel regardless of X2APIC value.
Updated by Robert Mustacchi about 7 years ago
- Subject changed from ASSERT panic in acpidev with X2APIC enabled to Useless assert in acpidev_cpu_query_MAT
- Assignee set to Robert Mustacchi
- % Done changed from 0 to 100
- Tags deleted (
needs-triage)
Updated by Robert Mustacchi about 7 years ago
We saw this on one of the systems we use at Joyent. Here's the analysis from our bug report:
The Shrimp MK III always panics on debug bits (regardless of the ACPI version) here: [0]> :c SunOS Release 5.11 Version joyent_20160721T220241Z 64-bit Copyright (c) 2010-2016, Joyent Inc. All rights reserved. DEBUG enabled panic[cpu0]/thread=fffffffffbc3fd40: assertion failed: mpa->Id != 255, file: ../../i86pc/io/acpi/acpidev/acpidev_cpu.c, line: 259 Warning - stack not written to the dump buffer fffffffffbc81c40 genunix:process_type+1683e0 () fffffffffbc81c70 acpidev:acpidev_cpu_query_MAT+128 () fffffffffbc81d00 acpidev:acpidev_walk_apic+ce () fffffffffbc81d70 acpidev:acpidev_cpu_get_procid+b0 () fffffffffbc81e10 acpidev:acpidev_cpu_filter_func+94 () fffffffffbc81e90 acpidev:acpidev_filter_device+62 () fffffffffbc81ee0 acpidev:acpidev_cpu_filter+5c () fffffffffbc81f70 acpidev:acpidev_process_object+1db () fffffffffbc81fa0 acpidev:acpidev_cpu_probe+f4 () fffffffffbc82010 acpidev:acpidev_probe_child+350 () fffffffffbc820a0 acpidev:acpidev_process_object+58e () fffffffffbc820d0 acpidev:acpidev_container_probe+a0 () fffffffffbc82140 acpidev:acpidev_probe_child+350 () fffffffffbc821d0 acpidev:acpidev_process_object+58e () fffffffffbc82200 acpidev:acpidev_device_probe+a5 () fffffffffbc82270 acpidev:acpidev_probe_child+350 () fffffffffbc822a0 acpidev:acpidev_boot_probe_device+6b () fffffffffbc822d0 acpidev:acpidev_boot_probe+95 () fffffffffbc822f0 unix:impl_bus_initialprobe+65 () fffffffffbc82320 unix:impl_setup_ddi+dc () fffffffffbc82340 genunix:create_devinfo_tree+cc () fffffffffbc82350 genunix:setup_ddi+13 () fffffffffbc823e0 unix:startup_modules+f6 () fffffffffbc823f0 unix:startup+55 () fffffffffbc82430 genunix:main+9b () fffffffffbc82440 unix:_locore_start+90 () panic: entering debugger (no dump device, continue to reboot) Loaded modules: [ mac specfs ] kmdb: target stopped at: kmdb_enter+0xb: movq %rax,%rdiFrom digging into this, I was able to find the following table entry:
kmdb: stop at acpidev`acpidev_cpu_query_MAT kmdb: target stopped at: acpidev`acpidev_cpu_query_MAT: pushq %rbp [0]> <rdi::print ACPI_MADT_LOCAL_APIC { Header = { Type = 0 Length = 0x8 } ProcessorId = 0xff Id = 0xff LapicFlags = 0 }
If you look at the ACPI specification, you'll note that the LapicFlags member is zero. This indicates that the CPU is not enabled. In fact, if you read the ACPI specification in more detail you'll find that it notes in v6.0 / 5.2.12.12 (Processor Local x2APIC Structure), that in theory logical processors here shouldn't have an apic ID of 255 and that they should be using an X2APIC entry instead.
I believe that this assertion was trying to basically rule out a case where we had a CPU with an APIC ID of 255. However, in this case, the CPU is not noted as enabled. We've seen this on other test systems before and I suspect some of the way that the ACPI tables are being constructed is leading them to have these kind of garbage disabled entries used as padding as we see a number of them show up here.
Based on that, I'd propose matching the other callers and only doing this when the processor is enabled. To help the non-debug case, we should warn about it. I'd suggest we move that down to the acpi_probe() function which is used by the PSM drivers.
Updated by Robert Mustacchi about 7 years ago
- Subject changed from Useless assert in acpidev_cpu_query_MAT to Useless assert in acpidev_cpu_query_MAT()
Updated by Electric Monk about 7 years ago
- Status changed from New to Closed
git commit e2af1b53c69207cf8b36451b3a8c8f800ee9855e
commit e2af1b53c69207cf8b36451b3a8c8f800ee9855e Author: Robert Mustacchi <rm@joyent.com> Date: 2016-09-09T21:10:52.000Z 6679 Useless assert in acpidev_cpu_query_MAT() Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Reviewed by: Joshua M. Clulow <jmc@joyent.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Richard Lowe <richlowe@richlowe.net>