Project

General

Profile

Actions

Bug #8977

closed

ipmi enumerator doesn't always enumerate nested entities

Added by Rob Johnston over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
2018-01-19
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

The ipmi topo enumerator module assumes that if a PSU or FAN entity is nested, it will be nested under either a POWER_DOMAIN or COOLING_DOMAIN entity, which does seem logical.

Unfortunately, we've seen at least one case (Dell R730) where the PSU entities are nested under a MOTHERBOARD entity (go figure). I also recall seeing something similar on an Oracle server platform years ago (in that case the PSU entities were nested under the SYSTEM_CHASSIS entity).

So clearly, the ipmi enumerator is making assumptions that don't always hold true in real life. The end result is that it can fail to enumerate the FAN and/or PSU topologies in the topo snapshot. This is a bug and this CR is to make the logic more flexible such that it always looks for nested PSU/FAN entities regardless of the entity type of the parent entity.

Actions #1

Updated by Rob Johnston over 3 years ago

Testing

I installed a platform image that included the changes for this CR on a Dell R730 (aka J3302) in the SF lab and verified that with the changes, the power supplies (PSUs) were now being enumerated in the topo snapshot:

[root@volcano ~]# /usr/lib/fm/fmd/fmtopo "*psu=*" 
TIME                 UUID
Dec 12 19:43:37 d74800e5-6b32-445c-e150-e6e356ef4a76

hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VDG:part=05RHVVA00/chassis=0/psu=0

hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VDG:part=05RHVVA00/chassis=0/psu=0?sensor=Presence

hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VDG:part=05RHVVA00/chassis=0/psu=0?sensor=Current 1

hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VDG:part=05RHVVA00/chassis=0/psu=0?sensor=Voltage 1

hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VDG:part=05RHVVA00/chassis=0/psu=0?sensor=Status

hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VCL:part=05RHVVA00/chassis=0/psu=1

hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VCL:part=05RHVVA00/chassis=0/psu=1?sensor=Presence

hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VCL:part=05RHVVA00/chassis=0/psu=1?sensor=Current 2

hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VCL:part=05RHVVA00/chassis=0/psu=1?sensor=Voltage 2

hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VCL:part=05RHVVA00/chassis=0/psu=1?sensor=Status
[root@volcano ~]# /usr/lib/fm/fmd/fmtopo -V "*psu=0*" 
TIME                 UUID
Dec 12 19:44:35 3a171cdd-a3c6-edd1-d3c2-c16ceb515838

hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VDG:part=05RHVVA00/chassis=0/psu=0
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VDG:part=05RHVVA00/chassis=0/psu=0
    label             string    PSU 0
    FRU               fmri      hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VDG:part=05RHVVA00/chassis=0/psu=0
  group: authority                      version: 1   stability: Private/Private
    product-id        string    Joyent-Compute-Platform-3302
    chassis-id        string    FHL5TD2
    server-id         string    volcano
  group: ipmi                           version: 1   stability: Private/Private
    entity-id         uint32    0xa
    entity-instance   uint32    0x1

hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VDG:part=05RHVVA00/chassis=0/psu=0?sensor=Presence
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VDG:part=05RHVVA00/chassis=0/psu=0?sensor=Presence
  group: authority                      version: 1   stability: Private/Private
    product-id        string    Joyent-Compute-Platform-3302
    chassis-id        string    FHL5TD2
    server-id         string    volcano
  group: facility                       version: 1   stability: Private/Private
    entity_ref        string[]  [ "Presence" ]
    sensor-class      string    discrete
    type              uint32    0x25 (PRESENCE)
    state             uint32    0x8002 (ABSENT)

hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VDG:part=05RHVVA00/chassis=0/psu=0?sensor=Current 1
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VDG:part=05RHVVA00/chassis=0/psu=0?sensor=Current 1
  group: authority                      version: 1   stability: Private/Private
    product-id        string    Joyent-Compute-Platform-3302
    chassis-id        string    FHL5TD2
    server-id         string    volcano
  group: facility                       version: 1   stability: Private/Private
    entity_ref        string[]  [ "Current 1" ]
    sensor-class      string    threshold
    type              uint32    0x101 (THRESHOLD_STATE)
    state             uint32    0xc0 (0xc0)
    reading           double    2.000000
    units             uint32    0x5 (AMPS)

hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VDG:part=05RHVVA00/chassis=0/psu=0?sensor=Voltage 1
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VDG:part=05RHVVA00/chassis=0/psu=0?sensor=Voltage 1
  group: authority                      version: 1   stability: Private/Private
    product-id        string    Joyent-Compute-Platform-3302
    chassis-id        string    FHL5TD2
    server-id         string    volcano
  group: facility                       version: 1   stability: Private/Private
    entity_ref        string[]  [ "Voltage 1" ]
    sensor-class      string    threshold
    type              uint32    0x101 (THRESHOLD_STATE)
    state             uint32    0xc0 (0xc0)
    reading           double    112.000000
    units             uint32    0x4 (VOLTS)

hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VDG:part=05RHVVA00/chassis=0/psu=0?sensor=Status
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:product-id=Joyent-Compute-Platform-3302:server-id=volcano:chassis-id=FHL5TD2:serial=CN1797268M9VDG:part=05RHVVA00/chassis=0/psu=0?sensor=Status
  group: authority                      version: 1   stability: Private/Private
    product-id        string    Joyent-Compute-Platform-3302
    chassis-id        string    FHL5TD2
    server-id         string    volcano
  group: facility                       version: 1   stability: Private/Private
    entity_ref        string[]  [ "Status" ]
    sensor-class      string    discrete
    type              uint32    0x8 (POWER_SUPPLY)
    state             uint32    0x8080 (0x80)

I also ran fmtopo in a mode where it drops a core at the end, and checked that there were no memory leaks introduced by these changes.

Actions #2

Updated by Electric Monk over 3 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 8f022dd6c1ebe3edc269726bf537617e665df32f

commit  8f022dd6c1ebe3edc269726bf537617e665df32f
Author: Rob Johnston <rob.johnston@joyent.com>
Date:   2018-01-23T21:33:23.000Z

    8967 libipmi: add support for GET_CHASSIS_STATUS command
    8974 fac_prov_ipmi should support binding by entity id and instance
    8975 ipmi topo plugin should automatically enumerate sensors on nodes it enumerates
    8976 ipmi enumerator should include FRU identity information in FMRI authority
    8977 ipmi enumerator doesn't always enumerate nested entities
    8978 Add topo facility method for controlling chassis ident indicator
    Reviewed by: Yuri Pankov <yuripv@icloud.com>
    Reviewed by: Ben Sims <bensims@gmail.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Actions

Also available in: Atom PDF