Kernel attempts to apply old erratum to AMD Threadripper 1950X
On my AMD desktop I was attempting to run a debug kernel under my KVM dev VM, which manages to trip an ASSERT.
Loading kmdb... Welcome to kmdb kmdb: unable to determine terminal type: assuming `vt100' Loaded modules: [ unix krtld genunix ] > :c WARNING: Couldn't read ACPI SRAT table from BIOS. lgrp support will be limited to one group. panic[cpu0]/thread=fffffffffbc590a0: assertion failed: family == 0xf || family == 0x10 || family == 0x11, file: ../../i86pc/os/mp_startup.c, line: 749 Warning - stack not written to the dump buffer fffffffffbc99040 genunix:max_hres_adj+15fba8 () fffffffffbc99050 unix:opteron_get_nnodes+68 () fffffffffbc99110 unix:workaround_errata+40d () fffffffffbc99150 unix:mlsetup+62d () fffffffffbc99160 unix:_locore_start+8b () panic: entering debugger (no dump device, continue to reboot) kmdb: target stopped at: kmdb_enter+0xb: movq %rax,%rdi > ::regs %rax = 0x0000000000000002 %r9 = 0x00000000000186a0 %rbx = 0xfffffffffb98fc00 %r10 = 0x0000000000000000 %rcx = 0x0000000000000020 %r11 = 0x0000000000000000 %rdx = 0x00000000000003f8 %r12 = 0xfffffffffbc9d328 panicbuf %rsi = 0x000000000000000a %r13 = 0x0000000000000000 %rdi = 0xfffffffffbfcb9c0 nullwrapper %r14 = 0x0000000000000000 %r8 = 0xfffffffffbca0fc0 panic_stack+0x19f0 %r15 = 0x0000000000000000 %rip = 0xfffffffffb88a36b kmdb_enter+0xb %rbp = 0xfffffffffbca13c0 %rsp = 0xfffffffffbca13c0 %rflags = 0x00000002 id=0 vip=0 vif=0 ac=0 vm=0 rf=0 nt=0 iopl=0x0 status=<of,df,if,tf,sf,zf,af,pf,cf> %cs = 0x0030 %ds = 0x0000 %es = 0x0000 %fs = 0x0000 %gs = 0x0000 %gsbase = 0xfffffffffbc5b000 %kgsbase = 0x200000000 %trapno = 0x14 %err = 0x0 %cr2 = 0x0 %cr3 = 0x1e400000
The tripped ASSERT seems to be the following:
736 /* 737 * This routine uses a PCI config space based mechanism 738 * for retrieving the number of nodes in the system. 739 * Device 24, function 0, offset 0x60 as used here is not 740 * AMD processor architectural, and may not work on processor 741 * families other than those listed below. 742 * 743 * Callers of this routine must ensure that we're running on 744 * a processor which supports this mechanism. 745 * The assertion below is meant to catch calls on unsupported 746 * processors. 747 */ 748 family = cpuid_getfamily(CPU); 749 ASSERT(family == 0xf || family == 0x10 || family == 0x11); > cpu::print cpu_t cpu_m.mcpu_cpi->cpi_family kmdb: failed to read 4 bytes at fffffffd0000001d: no mapping for address > cpu::print -at cpu_t cpu_m.mcpu_cpi fffffffffbc07a80 struct cpuid_info *cpu_m.mcpu_cpi = 0xfffffffd00000001
On an on a non DEBUG kernel I see the following with mdb
> cpu::print -at cpu_t cpu_m ! grep mcpu_cpi fffffffffbc07a80 struct cpuid_info *cpu_m.mcpu_cpi = 1
KVM passes in the CPU as
The physical processor has 1 virtual processor (15) x86 (AuthenticAMD 800F12 family 23 model 1 step 2 clock 3393 MHz) AMD EPYC Processor (with IBPB)
Linux reports the CPU as
processor : 31 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen Threadripper 1950X 16-Core Processor stepping : 1 microcode : 0x8001137 cpu MHz : 3316.490 cache size : 512 KB physical id : 0 siblings : 32 core id : 15 cpu cores : 16 apicid : 31 initial apicid : 31 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass bogomips : 6788.03 TLB size : 2560 4K pages clflush size : 64 cache_alignment : 64 address sizes : 43 bits physical, 48 bits virtual power management: ts ttp tm hwpstate eff_freq_ro  
It also looks like 0x17(family 23) for the family is not in the ASSERT statement. Perhaps this needs to be updated for newer CPUs?
So, fundamentally, we shouldn't be calling into opteron_get_nnodes() at all. In fact, it's very suspicious that we've detected any errata at all, since all of the errata that we're trying to work around mostly are at best in family 0x10. I think the reason that family 0x11 was added was because we also try to work around not having a constant TSC there. Though it's harder to say. Looking at the programming manual for recent processors, it's pretty clear that at least on families since 15h, the technique used there is not going to work.
So that leads us to ask the question of why are we calling this at all? With some help from Michael Zeller, we were able to figure out that we were calling into this because of what we call erratum '6336786'. This isn't actually an erratum at all! Instead this is an attempt to work around the fact that certain deep C states cause a non-constant TSC to occur. Unfortunately, nothing seems to virtualize this today. Even worse, the things that we want to somewhat arbitrarily write don't seem to exist on all platforms.
1. Booting on the impacted threadripper box, both when virtualized which triggers this and on bare metal, which should not trigger this.
2. Tested on an AMD EPYC box physically to make sure there were no problems
Updated by Electric Monk about 1 year ago
- Status changed from New to Closed
commit bf9b145b69c7f9e2ab076ee1ab9461cb6d527e64 Author: Robert Mustacchi <firstname.lastname@example.org> Date: 2019-01-29T21:04:52.000Z 10275 Kernel attempts to apply old erratum to AMD Threadripper 1950X Reviewed by: Hans Rosenfeld <email@example.com> Reviewed by: Jerry Jelinek <firstname.lastname@example.org> Reviewed by: Mike Zeller <email@example.com> Reviewed by: Andy Stormont <firstname.lastname@example.org> Reviewed by: Toomas Soome <email@example.com> Reviewed by: Gergő Doma <firstname.lastname@example.org> Approved by: Richard Lowe <email@example.com>