Project

General

Profile

Feature #9459

Implement topo module to enumerate dimms from smbios

Added by Rob Johnston over 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

On many newer Intel platforms and all recent AMD platforms, the dimms are not represented at all in the hc-scheme topo snapshot.

In a ideal world we would implement memory controller drivers for all the recent Intel and AMD chip generations so that we could fully enumerate the memory topology (controllers. channels, ranks, dimms). This would allow us to associate what's in topology with MCA events and then we could write Eversholt rules to diagnose the error telemetry.

We may still do this for new Intel platforms, but trying to provide such support for all Intel and AMD platforms is likely not practical.

As a first step, it would be nice to just simply get the DIMMs into topology so that we could generate a more complete physical inventory of the system from the topo snapshot. It would also provide a place to hang the DIMM-related sensors off of.

Information on the DIMM slots and installed DIMMs is available in SMBIOS. This CR is to develop a topo module that will enumerate slot and dimm nodes, as children of the motherboard node, from the SMB_TYPE_MEMDEVICE records. This code would first check for the existence of a functional memory controller driver (e.g intel_snb). If one is found, the module will bow out gracefully. The module will also step aside on SPARC platforms. Otherwise, it will perform DIMM enumeration from SMBIOS.

Note, that this change has already been integrated into illumos-joyent, as part of the large commit below:

https://github.com/joyent/illumos-joyent/commit/4a99ae161887bed6eed6dcb1699f188f023921a2

History

#1

Updated by Rob Johnston about 1 year ago

Testing notes are in the original illumos-joyent bug report:

https://smartos.org/bugview/OS-6490

Additionally, I onu'd an openindiana workstation with these changes and verified that the dimms were correctly enumerated in the topo snapshot - see output below:

root@openindiana:~# uname -a
SunOS openindiana 5.11 master-0-g51b0315b1b i86pc i386 i86pc
root@openindiana:~# /usr/lib/fm/fmd/fmtopo -V "*dimm*" 
TIME                 UUID
Sep 04 18:45:02 daa1cc58-37dd-6c4e-f152-859c2185c829

hc://:product-id=System-Product-Name:server-id=openindiana:chassis-id=System-Serial-Number:serial=00000000:part=CMU16GX4M2A2400C16/motherboard=0/slot=0/dimm=0
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:product-id=System-Product-Name:server-id=openindiana:chassis-id=System-Serial-Number:serial=00000000:part=CMU16GX4M2A2400C16/motherboard=0/slot=0/dimm=0
    FRU               fmri      hc://:product-id=System-Product-Name:server-id=openindiana:chassis-id=System-Serial-Number:serial=00000000:part=CMU16GX4M2A2400C16/motherboard=0/slot=0/dimm=0
    label             string    ChannelA-DIMM1
  group: authority                      version: 1   stability: Private/Private
    product-id        string    System-Product-Name
    chassis-id        string    System-Serial-Number
    server-id         string    openindiana
  group: dimm-properties                version: 1   stability: Private/Private
    size              uint64    0x200000000
    type              string    DDR4
    rank              uint32    0x1
    configured-speed  uint32    0x855
    maximum-speed     uint32    0x855
    configured-voltage double    1.000000
    manufacturer      string    Corsair
    asset-tag         string    9876543210
    location          string    ChannelA-DIMM1

hc://:product-id=System-Product-Name:server-id=openindiana:chassis-id=System-Serial-Number:serial=00000000:part=3400-C16-Series/motherboard=0/slot=1/dimm=0
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:product-id=System-Product-Name:server-id=openindiana:chassis-id=System-Serial-Number:serial=00000000:part=3400-C16-Series/motherboard=0/slot=1/dimm=0
    FRU               fmri      hc://:product-id=System-Product-Name:server-id=openindiana:chassis-id=System-Serial-Number:serial=00000000:part=3400-C16-Series/motherboard=0/slot=1/dimm=0
    label             string    ChannelA-DIMM2
  group: authority                      version: 1   stability: Private/Private
    product-id        string    System-Product-Name
    chassis-id        string    System-Serial-Number
    server-id         string    openindiana
  group: dimm-properties                version: 1   stability: Private/Private
    size              uint64    0x200000000
    type              string    DDR4
    rank              uint32    0x1
    configured-speed  uint32    0x855
    maximum-speed     uint32    0x855
    configured-voltage double    1.000000
    manufacturer      string    Patriot
    asset-tag         string    9876543210
    location          string    ChannelA-DIMM2

hc://:product-id=System-Product-Name:server-id=openindiana:chassis-id=System-Serial-Number:serial=00000000:part=CMU16GX4M2A2400C16/motherboard=0/slot=2/dimm=0
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:product-id=System-Product-Name:server-id=openindiana:chassis-id=System-Serial-Number:serial=00000000:part=CMU16GX4M2A2400C16/motherboard=0/slot=2/dimm=0
    FRU               fmri      hc://:product-id=System-Product-Name:server-id=openindiana:chassis-id=System-Serial-Number:serial=00000000:part=CMU16GX4M2A2400C16/motherboard=0/slot=2/dimm=0
    label             string    ChannelB-DIMM1
  group: authority                      version: 1   stability: Private/Private
    product-id        string    System-Product-Name
    chassis-id        string    System-Serial-Number
    server-id         string    openindiana
  group: dimm-properties                version: 1   stability: Private/Private
    size              uint64    0x200000000
    type              string    DDR4
    rank              uint32    0x1
    configured-speed  uint32    0x855
    maximum-speed     uint32    0x855
    configured-voltage double    1.000000
    manufacturer      string    Corsair
    asset-tag         string    9876543210
    location          string    ChannelB-DIMM1

hc://:product-id=System-Product-Name:server-id=openindiana:chassis-id=System-Serial-Number:serial=00000000:part=3400-C16-Series/motherboard=0/slot=3/dimm=0
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:product-id=System-Product-Name:server-id=openindiana:chassis-id=System-Serial-Number:serial=00000000:part=3400-C16-Series/motherboard=0/slot=3/dimm=0
    FRU               fmri      hc://:product-id=System-Product-Name:server-id=openindiana:chassis-id=System-Serial-Number:serial=00000000:part=3400-C16-Series/motherboard=0/slot=3/dimm=0
    label             string    ChannelB-DIMM2
  group: authority                      version: 1   stability: Private/Private
    product-id        string    System-Product-Name
    chassis-id        string    System-Serial-Number
    server-id         string    openindiana
  group: dimm-properties                version: 1   stability: Private/Private
    size              uint64    0x200000000
    type              string    DDR4
    rank              uint32    0x1
    configured-speed  uint32    0x855
    maximum-speed     uint32    0x855
    configured-voltage double    1.000000
    manufacturer      string    Patriot
    asset-tag         string    9876543210
    location          string    ChannelB-DIMM2

root@openindiana:~# smbios -t SMB_TYPE_MEMDEVICE
ID    SIZE TYPE
67    110  SMB_TYPE_MEMDEVICE (type 17) (memory device)

  Manufacturer: Corsair
  Serial Number: 00000000
  Asset Tag: 9876543210
  Location Tag: ChannelA-DIMM1
  Part Number: CMU16GX4M2A2400C16  

  Physical Memory Array: 66
  Memory Error Data: Not Supported
  Total Width: 64 bits
  Data Width: 64 bits
  Size: 8589934592 bytes
  Form Factor: 9 (DIMM)
  Set: None
  Rank: 1 (single)
  Memory Type: 26 (DDR4)
  Flags: 0x4080
        SMB_MDF_SYNC (synchronous)
        SMB_MDF_UNREG (Unregistered (Unbuffered))
  Speed: 2133 MT/s
  Configured Speed: 2133 MT/s
  Device Locator: ChannelA-DIMM1
  Bank Locator: BANK 0
  Minimum Voltage: Unknown
  Maximum Voltage: Unknown
  Configured Voltage: 1.20V

ID    SIZE TYPE
68    110  SMB_TYPE_MEMDEVICE (type 17) (memory device)

  Manufacturer: Patriot
  Serial Number: 00000000
  Asset Tag: 9876543210
  Location Tag: ChannelA-DIMM2
  Part Number: 3400 C16 Series     

  Physical Memory Array: 66
  Memory Error Data: Not Supported
  Total Width: 64 bits
  Data Width: 64 bits
  Size: 8589934592 bytes
  Form Factor: 9 (DIMM)
  Set: None
  Rank: 1 (single)
  Memory Type: 26 (DDR4)
  Flags: 0x4080
        SMB_MDF_SYNC (synchronous)
        SMB_MDF_UNREG (Unregistered (Unbuffered))
  Speed: 2133 MT/s
  Configured Speed: 2133 MT/s
  Device Locator: ChannelA-DIMM2
  Bank Locator: BANK 1
  Minimum Voltage: Unknown
  Maximum Voltage: Unknown
  Configured Voltage: 1.20V

ID    SIZE TYPE
69    110  SMB_TYPE_MEMDEVICE (type 17) (memory device)

  Manufacturer: Corsair
  Serial Number: 00000000
  Asset Tag: 9876543210
  Location Tag: ChannelB-DIMM1
  Part Number: CMU16GX4M2A2400C16  

  Physical Memory Array: 66
  Memory Error Data: Not Supported
  Total Width: 64 bits
  Data Width: 64 bits
  Size: 8589934592 bytes
  Form Factor: 9 (DIMM)
  Set: None
  Rank: 1 (single)
  Memory Type: 26 (DDR4)
  Flags: 0x4080
        SMB_MDF_SYNC (synchronous)
        SMB_MDF_UNREG (Unregistered (Unbuffered))
  Speed: 2133 MT/s
  Configured Speed: 2133 MT/s
  Device Locator: ChannelB-DIMM1
  Bank Locator: BANK 2
  Minimum Voltage: Unknown
  Maximum Voltage: Unknown
  Configured Voltage: 1.20V

ID    SIZE TYPE
70    110  SMB_TYPE_MEMDEVICE (type 17) (memory device)

  Manufacturer: Patriot
  Serial Number: 00000000
  Asset Tag: 9876543210
  Location Tag: ChannelB-DIMM2
  Part Number: 3400 C16 Series     

  Physical Memory Array: 66
  Memory Error Data: Not Supported
  Total Width: 64 bits
  Data Width: 64 bits
  Size: 8589934592 bytes
  Form Factor: 9 (DIMM)
  Set: None
  Rank: 1 (single)
  Memory Type: 26 (DDR4)
  Flags: 0x4080
        SMB_MDF_SYNC (synchronous)
        SMB_MDF_UNREG (Unregistered (Unbuffered))
  Speed: 2133 MT/s
  Configured Speed: 2133 MT/s
  Device Locator: ChannelB-DIMM2
  Bank Locator: BANK 3
  Minimum Voltage: Unknown
  Maximum Voltage: Unknown
  Configured Voltage: 1.20V

#2

Updated by Rob Johnston about 1 year ago

Note that this port from illumos-joyent to illumos-gate also incorporates the following followup push to illumos-joyent, to correct a build failure that Igor saw on dilos,

https://github.com/joyent/illumos-joyent/commit/899e8e86192afce363b05b30f79590eef12b9724

#3

Updated by Electric Monk about 1 year ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 6d65bee7bcc62b2d9bdfde6610561ce76c92a908

commit  6d65bee7bcc62b2d9bdfde6610561ce76c92a908
Author: Rob Johnston <rob.johnston@joyent.com>
Date:   2018-09-06T16:52:36.000Z

    9459 Implement topo module to enumerate dimms from smbios
    Reviewed by: Yuri Pankov <yuripv@yuripv.net>
    Reviewed by: Igor Kozhukhov <igor@dilos.org>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Also available in: Atom PDF