Project

General

Profile

Feature #11976

Want CPU temperature sensors for Zen 2, Raven Ridge

Added by Robert Mustacchi 5 months ago. Updated 5 days ago.

Status:
Closed
Priority:
Normal
Category:
driver - device drivers
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

We should enable the amdf17nbdf driver to know about Zen 2 (both Matisse and Rome) and Raven Ridge processors. This consists of updates to the amdf17nbdf driver.

History

#2

Updated by Robert Mustacchi 3 months ago

We've managed to verify this on a Zen+ APU by Gary Mills. On that platform we see:

<root@ryzen># /usr/lib/fm/fmd/fmtopo -V '*sensor=temp*'
TIME                 UUID
Dec 29 17:06:26 c424cc3e-b9a0-ca24-b907-a53333468535

    hc://:product-id=System-Product-Name:server-id=ryzen:chassis-id=System-Serial-Number/motherboard=0/chip=0/core=0?sensor=temp
      group: protocol                       version: 1   stability: Private/Private
        resource          fmri      hc://:product-id=System-Product-Name:server-id=ryzen:chassis-id=System-Serial-Number/motherboard=0/chip=0/core=0?sensor=temp
      group: authority                      version: 1   stability: Private/Private
        product-id        string    System-Product-Name
        chassis-id        string    System-Serial-Number
        server-id         string    ryzen
      group: facility                       version: 1   stability: Private/Private
        sensor-class      string    threshold
        type              uint32    0x1 (TEMP)
        units             uint32    0x1 (DEGREES_C)
        reading           double    29.375000

    hc://:product-id=System-Product-Name:server-id=ryzen:chassis-id=System-Serial-Number/motherboard=0/chip=0/core=1?sensor=temp
      group: protocol                       version: 1   stability: Private/Private
        resource          fmri      hc://:product-id=System-Product-Name:server-id=ryzen:chassis-id=System-Serial-Number/motherboard=0/chip=0/core=1?sensor=temp
      group: authority                      version: 1   stability: Private/Private
        product-id        string    System-Product-Name
        chassis-id        string    System-Serial-Number
        server-id         string    ryzen
      group: facility                       version: 1   stability: Private/Private
        sensor-class      string    threshold
        type              uint32    0x1 (TEMP)
        units             uint32    0x1 (DEGREES_C)
        reading           double    29.375000

    hc://:product-id=System-Product-Name:server-id=ryzen:chassis-id=System-Serial-Number/motherboard=0/chip=0/core=8?sensor=temp
      group: protocol                       version: 1   stability: Private/Private
        resource          fmri      hc://:product-id=System-Product-Name:server-id=ryzen:chassis-id=System-Serial-Number/motherboard=0/chip=0/core=8?sensor=temp
      group: authority                      version: 1   stability: Private/Private
        product-id        string    System-Product-Name
        chassis-id        string    System-Serial-Number
        server-id         string    ryzen
      group: facility                       version: 1   stability: Private/Private
        sensor-class      string    threshold
        type              uint32    0x1 (TEMP)
        units             uint32    0x1 (DEGREES_C)
        reading           double    29.375000

    hc://:product-id=System-Product-Name:server-id=ryzen:chassis-id=System-Serial-Number/motherboard=0/chip=0/core=9?sensor=temp
      group: protocol                       version: 1   stability: Private/Private
        resource          fmri      hc://:product-id=System-Product-Name:server-id=ryzen:chassis-id=System-Serial-Number/motherboard=0/chip=0/core=9?sensor=temp
      group: authority                      version: 1   stability: Private/Private
        product-id        string    System-Product-Name
        chassis-id        string    System-Serial-Number
        server-id         string    ryzen
      group: facility                       version: 1   stability: Private/Private
        sensor-class      string    threshold
        type              uint32    0x1 (TEMP)
        units             uint32    0x1 (DEGREES_C)
        reading           double    29.375000
#3

Updated by Harmen R 3 months ago

I've tested this on a Ryzen 3700X / 470X system.

It's reporting temperatures for the cpu, I'm not sure the temperature is correct though, it's consistently 10 degrees Celsius higher than what the ipmi console is reporting for the cpu. I remember reading something about an offset like that, but I can't find it anymore unfortunately.

I've also read the temperature on the latest Linux kernel, on linux the readings agree with the ipmi values.

[root@ryzen-server ~]# /usr/lib/fm/fmd/fmtopo -V '*sensor=temp*'
TIME                 UUID
Jan 02 19:06:56 3f2de7d0-f9f7-4672-8297-a90ebff26d7a

hc://:server-id=ryzen-server/motherboard=0/chip=0/core=0?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=ryzen-server/motherboard=0/chip=0/core=0?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    ryzen-server
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    49.750000

hc://:server-id=ryzen-server/motherboard=0/chip=0/core=1?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=ryzen-server/motherboard=0/chip=0/core=1?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    ryzen-server
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    49.750000

hc://:server-id=ryzen-server/motherboard=0/chip=0/core=2?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=ryzen-server/motherboard=0/chip=0/core=2?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    ryzen-server
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    49.750000

hc://:server-id=ryzen-server/motherboard=0/chip=0/core=3?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=ryzen-server/motherboard=0/chip=0/core=3?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    ryzen-server
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    49.750000

hc://:server-id=ryzen-server/motherboard=0/chip=0/core=4?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=ryzen-server/motherboard=0/chip=0/core=4?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    ryzen-server
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    49.750000

hc://:server-id=ryzen-server/motherboard=0/chip=0/core=5?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=ryzen-server/motherboard=0/chip=0/core=5?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    ryzen-server
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    49.750000

hc://:server-id=ryzen-server/motherboard=0/chip=0/core=6?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=ryzen-server/motherboard=0/chip=0/core=6?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    ryzen-server
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    49.750000

hc://:server-id=ryzen-server/motherboard=0/chip=0/core=7?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=ryzen-server/motherboard=0/chip=0/core=7?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    ryzen-server
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    49.750000
#4

Updated by Robert Mustacchi 3 months ago

Thanks for testing this Harmen. The fact that it's 10 degrees off is definitely a bit weird. While older Ryzen models did have an offset that the platform is required to note, it definitely doesn't for the Matisse family and your 3700X. I'd like to see what the raw value we read was. While on illumos with these changes could you try and run the following in mdb -k?

*amdf17nbdf::print amdf17nbdf_t amd_nbdf_nbs | ::walk list | ::print amdf17nb_t

I'm pretty sure that's the right syntax. If you could also run impitool sensor and include it here (running that on the local host right around the same time as you run that mdb command), I'd appreciate it.

#5

Updated by Robert Mustacchi 3 months ago

One other question, can you confirm for me how you're viewing the temperature on Linux?

#6

Updated by Harmen R 3 months ago

No problem. On linux I used the k10temp driver with lm-sensors. K10temp has support for this cpu from kernel 5.4.

I ran the commands you asked for consecutively, the box being idle as in all zones stopped.

> *amdf17nbdf::print amdf17nbdf_t amd_nbdf_nbs | ::walk list | ::print amdf17nb_t
{
    amd_nb_link = {
        list_next = 0xfffffe5942f1d7b8
        list_prev = 0xfffffe5942f1d7b8
    }
    amd_nb_dip = 0xfffffe592a744558
    amd_nb_cfgspace = 0xfffffe5936580140
    amd_nb_bus = 0
    amd_nb_dev = 0
    amd_nb_func = 0
    amd_nb_df = 0xfffffe5942931220
    amd_nb_procnodeid = 0
    amd_nb_temp_minor = 0x1
    amd_nb_temp_last_read = 0x16d57107328
    amd_nb_temp_off = 0
    amd_nb_temp_reg = 0x2f500fef
    amd_nb_temp = 0x17a
}

3VSB             | 3.360      | Volts      | ok    | 2.880     | 3.040     | na        | na        | 3.700     | 3.880     
5VSB             | 5.040      | Volts      | ok    | 4.260     | 4.500     | na        | na        | 5.490     | 5.730     
VCPU             | 0.930      | Volts      | ok    | na        | na        | na        | na        | 1.650     | 1.730     
VSOC             | 1.020      | Volts      | ok    | 0.340     | 0.360     | na        | na        | 1.540     | 1.610     
VCCM             | 1.190      | Volts      | ok    | 1.020     | 1.080     | na        | na        | 1.320     | 1.380     
APU_VDDP         | 0.950      | Volts      | ok    | 0.770     | 0.810     | na        | na        | 1.160     | 1.210     
3V               | 3.360      | Volts      | ok    | 2.800     | 2.980     | na        | na        | 3.620     | 3.780     
5V               | 5.040      | Volts      | ok    | 4.260     | 4.500     | na        | na        | 5.490     | 5.730     
12V              | 12.000     | Volts      | ok    | 10.200    | 10.800    | na        | na        | 13.200    | 13.800    
MB Temp          | 24.000     | degrees C  | ok    | na        | na        | na        | 55.000    | na        | na        
Card side Temp   | 26.000     | degrees C  | ok    | na        | na        | na        | 68.000    | na        | na        
CPU Temp         | 37.000     | degrees C  | ok    | na        | na        | na        | 93.000    | 94.000    | na        
DDR4_A2_Temp     | na         | degrees C  | na    | na        | na        | na        | 84.000    | 85.000    | na        
DDR4_A1_Temp     | 23.000     | degrees C  | ok    | na        | na        | na        | 84.000    | 85.000    | na        
DDR4_B2_Temp     | na         | degrees C  | na    | na        | na        | na        | 84.000    | 85.000    | na        
DDR4_B1_Temp     | 23.000     | degrees C  | ok    | na        | na        | na        | 84.000    | 85.000    | na        
FAN1             | 1200.000   | RPM        | ok    | na        | na        | 100.000   | na        | na        | na        
FAN2             | 5600.000   | RPM        | ok    | na        | na        | 100.000   | na        | na        | na        
FAN3             | 5200.000   | RPM        | ok    | na        | na        | 100.000   | na        | na        | na        
FAN4             | na         | RPM        | na    | na        | na        | 100.000   | na        | na        | na        
FAN5             | 5400.000   | RPM        | ok    | na        | na        | 100.000   | na        | na        | na        
FAN6             | 5000.000   | RPM        | ok    | na        | na        | 100.000   | na        | na        | na        
ChassisIntr      | 0x0        | discrete   | 0x0080| na        | na        | na        | na        | na        | na        
CPU_PROCHOT      | 0x0        | discrete   | 0x0080| na        | na        | na        | na        | na        | na        
CPU_THERMTRIP    | 0x0        | discrete   | 0x0080| na        | na        | na        | na        | na        | na        
PSU1 Status      | 0x0        | discrete   | 0x0080| na        | na        | na        | na        | na        | na        
PSU1 AC lost     | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
PSU2 Status      | 0x0        | discrete   | 0x0080| na        | na        | na        | na        | na        | na        
PSU2 AC lost     | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
1.05V_PROM_S5    | 1.060      | Volts      | ok    | 0.890     | 0.950     | na        | na        | 1.160     | 1.210     
2.5V_PROM        | 2.580      | Volts      | ok    | 2.120     | 2.260     | na        | na        | 2.740     | 2.880     
1.05V_PROM_RUN   | 1.010      | Volts      | ok    | 0.890     | 0.950     | na        | na        | 1.160     | 1.210     
BAT              | 3.100      | Volts      | ok    | 2.000     | 2.700     | na        | na        | 3.400     | 3.560     
PSU2 PIN         | na         | Watts      | na    | na        | na        | na        | na        | na        | na        
PSU2 POUT        | na         | Watts      | na    | na        | na        | na        | na        | na        | na        
PSU1 PIN         | na         | Watts      | na    | na        | na        | na        | na        | na        | na        
PSU1 VIN         | na         | Volts      | na    | na        | na        | na        | na        | na        | na        
PSU2 VIN         | na         | Volts      | na    | na        | na        | na        | na        | na        | na        
PSU2 IOUT        | na         | Amps       | na    | na        | na        | na        | na        | na        | na        
PSU1 IOUT        | na         | Amps       | na    | na        | na        | na        | na        | na        | na        
PSU1 POUT        | na         | Watts      | na    | na        | na        | na        | na        | na        | na 
#7

Updated by Robert Mustacchi 3 months ago

Thank you for including that, Harmen. So based on the math, we are reporting the Tctl correctly based on what we've read. This should be using the same calculations. So I'm not sure why we got a 10 degree higher reading than Linux did. I could see some amount of power or other measurements change this there such that it'd be different, but if IPMI is reading something different that's weird, especially when they match on Linux. Given that this system can usually run both the Zen 1 / Zen 2 chips, I could see firmware accidentally including the Ryzen 7 2700X 10 degree offset, but then I wouldn't expect Linux to match that.

I've confirmed that the 5.4 kernel doesn't have a temperature adjustment offset for the Ryzen 3 (Zen 2) line. Can you confirm what model of motherboard this is? I assume the IPMI firmware is relatively up to date. I'll see if we can get some other numbers from different Zen 2 systems to see if we can make more sense of what's going on.

#8

Updated by Harmen R 3 months ago

I have a X470D4U from ASRock Rack (https://www.asrockrack.com/general/productdetail.asp?Model=X470D4U). My bios is the latest version, I see there's BMC firmware update available. I'll try and update that.

I will also double check my linux results, just in case.

#9

Updated by Harmen R 3 months ago

Ok, I've updated the BMC to the latest. No changes.

On linux the k10temp driver reports the same temperatures as ipmitool, on smartos the reported temperature is 10 degrees Celsius higher.

#10

Updated by Robert Mustacchi 6 days ago

So, another data point here. I have a Rome system I built using the AsRock Rack EPYCD8 system. Testing this does show that all of the termparature data is inline between the OS and the BMC. Here's what I see there for comparison:

rm@beowulf:~$ pfexec /usr/lib/fm/fmd/fmtopo -V *sensor=temp
TIME                 UUID
Apr 02 17:35:20 2e847e24-0ded-e491-b920-a0c79ec0764c

hc://:server-id=beowulf/motherboard=0/chip=0/core=0?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=beowulf/motherboard=0/chip=0/core=0?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    beowulf
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    27.375000

hc://:server-id=beowulf/motherboard=0/chip=0/core=1?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=beowulf/motherboard=0/chip=0/core=1?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    beowulf
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    27.375000

hc://:server-id=beowulf/motherboard=0/chip=0/core=2?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=beowulf/motherboard=0/chip=0/core=2?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    beowulf
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    27.375000

hc://:server-id=beowulf/motherboard=0/chip=0/core=3?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=beowulf/motherboard=0/chip=0/core=3?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    beowulf
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    27.375000

hc://:server-id=beowulf/motherboard=0/chip=0/core=4?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=beowulf/motherboard=0/chip=0/core=4?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    beowulf
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    27.375000

hc://:server-id=beowulf/motherboard=0/chip=0/core=5?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=beowulf/motherboard=0/chip=0/core=5?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    beowulf
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    27.375000

hc://:server-id=beowulf/motherboard=0/chip=0/core=6?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=beowulf/motherboard=0/chip=0/core=6?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    beowulf
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    27.375000

hc://:server-id=beowulf/motherboard=0/chip=0/core=7?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=beowulf/motherboard=0/chip=0/core=7?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    beowulf
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    27.375000

hc://:server-id=beowulf/motherboard=0/chip=0/core=8?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=beowulf/motherboard=0/chip=0/core=8?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    beowulf
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    27.375000

hc://:server-id=beowulf/motherboard=0/chip=0/core=9?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=beowulf/motherboard=0/chip=0/core=9?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    beowulf
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    27.375000

hc://:server-id=beowulf/motherboard=0/chip=0/core=10?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=beowulf/motherboard=0/chip=0/core=10?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    beowulf
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    27.375000

hc://:server-id=beowulf/motherboard=0/chip=0/core=11?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=beowulf/motherboard=0/chip=0/core=11?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    beowulf
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    27.375000

hc://:server-id=beowulf/motherboard=0/chip=0/core=12?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=beowulf/motherboard=0/chip=0/core=12?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    beowulf
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    27.375000

hc://:server-id=beowulf/motherboard=0/chip=0/core=13?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=beowulf/motherboard=0/chip=0/core=13?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    beowulf
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    27.375000

hc://:server-id=beowulf/motherboard=0/chip=0/core=14?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=beowulf/motherboard=0/chip=0/core=14?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    beowulf
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    27.375000

hc://:server-id=beowulf/motherboard=0/chip=0/core=15?sensor=temp
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:server-id=beowulf/motherboard=0/chip=0/core=15?sensor=temp
  group: authority                      version: 1   stability: Private/Private
    server-id         string    beowulf
  group: facility                       version: 1   stability: Private/Private
    sensor-class      string    threshold
    type              uint32    0x1 (TEMP)
    units             uint32    0x1 (DEGREES_C)
    reading           double    27.375000

and the relevant temperature sensor from the BMC:

CPU Temp         | 27.000     | degrees C  | ok    | na        | na        | na        | 95.000    | na        | na        

So at least it aligns here. Since I can't find anything where we're reading it incorrectly from the CPU, I think it may be worth moving forward with this. I'm sure we'll continue to see some challenges with the OS temperature sensors, but if it's the temperature that we're reading which the CPU will be generating alerts from it, I think it's probably worth it.

#11

Updated by Electric Monk 5 days ago

  • Status changed from New to Closed
  • % Done changed from 90 to 100

git commit 0e6adfea4a40da04a1864bdeed7e17450ce04df5

commit  0e6adfea4a40da04a1864bdeed7e17450ce04df5
Author: Robert Mustacchi <rm@fingolfin.org>
Date:   2020-04-03T05:16:46.000Z

    11976 Want CPU temperature sensors for Zen 2, Raven Ridge
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Joshua M. Clulow <josh@sysmgr.org>

Also available in: Atom PDF