Project

General

Profile

Feature #9587

Add test mechanism to sensor-transport module for spoofing sensor states

Added by Rob Johnston over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
2018-06-07
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

To make it easier to test and verify the code for detecting and handling sensor failures, there should be a way to spoof the state for any sensor in topo. The most practical way to implement this would be to add support for an optional configuration parameter in sensor-transport.conf where one could specify one or more sensors and the state value(s) that sensor-transport should pretend to have read from them.

Note that this change has already been integrated into illumos-joyent via the commit below:

commit 3843bb9b187919e79faf125f8ef4d7979a130486
Author: Rob Johnston <rob.johnston@joyent.com>
Date:   Thu Mar 1 00:49:05 2018 +0000

    OS-6513 Add platform-specific topo maps for the Joyent J330x Compute Platform
    OS-6657 Add test mechanism to sensor-transport module for spoofing sensor states
    OS-6710 need to handle SP's that present multiple sensors with the same entity name
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Approved by: Joshua M. Clulow <jmc@joyent.com>

History

#1

Updated by Rob Johnston over 2 years ago

Testing

I created a platform image with this change and then exercised the change by loopback mounting various versions of sensor-transport.conf to verify that the new "spoof_sensor_state" could be parsed correctly and that the intended effect was achieved.

scenario 1 - spoof the state for a single sensor:

# mount -F lofs /var/tmp/sensor-transport.conf /usr/lib/fm/fmd/plugins/sensor-transport.conf 
# cat /usr/lib/fm/fmd/plugins/sensor-transport.conf 
setprop spoof_sensor_state "*psu=1:PS2 Status:0x2" 
# fmadm reset sensor-transport
fmadm: sensor-transport module has been reset
[wait at least 60 seconds...]
# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Apr 03 22:40:26 ada74c04-053e-6739-f113-fc14bfaee142  SENSOR-8000-6G Major     

Host        : magma
Platform    : Joyent-Storage-Platform-7001    Chassis_id  : S247158X6A07720
Product_sn  : 

Fault class : fault.psu.failed-int
Problem in  : "PSU 1" (hc://:product-id=Joyent-Storage-Platform-7001:server-id=magma:chassis-id=S247158X6A07720/chassis=0/psu=1)
                  faulted but still in service
FRU         : "PSU 1" (hc://:product-id=Joyent-Storage-Platform-7001:server-id=magma:chassis-id=S247158X6A07720/chassis=0/psu=1)
                  faulty

Description : A sensor indicates that this power supply has failed.  Refer to
              http://illumos.org/msg/SENSOR-8000-6G for more information.

Response    : None.

Impact      : The enclosure may be getting inadequate power.  Subsequent loss
              of power supplies may force the enclosure to shutdown.

Action      : Replace the indicated power supply

scenario 2 - spoof the state for multiple sensors

#  mount -F lofs /var/tmp/sensor-transport.conf /usr/lib/fm/fmd/plugins/sensor-transport.conf 
# cat /usr/lib/fm/fmd/plugins/sensor-transport.conf 
setprop spoof_sensor_state "*psu=1:PS2 Status:0x2;*psu=0:PS1 Status:0x2" 
]# fmadm reset sensor-transport
fmadm: sensor-transport module has been reset
# 
[wait at least 60 seconds...]
# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Apr 03 23:37:59 004e77a1-1d04-6e24-aa1c-a8ed6323c9a9  SENSOR-8000-6G Major     

Host        : magma
Platform    : Joyent-Storage-Platform-7001      Chassis_id  : S247158X6A07720
Product_sn  : 

Fault class : fault.psu.failed-int
Problem in  : "PSU 0" (hc://:product-id=Joyent-Storage-Platform-7001:server-id=magma:chassis-id=S247158X6A07720/chassis=0/psu=0)
                  faulted but still in service
FRU         : "PSU 0" (hc://:product-id=Joyent-Storage-Platform-7001:server-id=magma:chassis-id=S247158X6A07720/chassis=0/psu=0)
                  faulty

Description : A sensor indicates that this power supply has failed.  Refer to
              http://illumos.org/msg/SENSOR-8000-6G for more information.

Response    : None.

Impact      : The enclosure may be getting inadequate power.  Subsequent loss
              of power supplies may force the enclosure to shutdown.

Action      : Replace the indicated power supply

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Apr 03 23:37:59 81068e9e-9bca-6d82-ac86-c0dfc8c4a0cd  SENSOR-8000-6G Major     

Host        : magma
Platform    : Joyent-Storage-Platform-7001      Chassis_id  : S247158X6A07720
Product_sn  : 

Fault class : fault.psu.failed-int
Problem in  : "PSU 1" (hc://:product-id=Joyent-Storage-Platform-7001:server-id=magma:chassis-id=S247158X6A07720/chassis=0/psu=1)
                  faulted but still in service
FRU         : "PSU 1" (hc://:product-id=Joyent-Storage-Platform-7001:server-id=magma:chassis-id=S247158X6A07720/chassis=0/psu=1)
                  faulty

Description : A sensor indicates that this power supply has failed.  Refer to
              http://illumos.org/msg/SENSOR-8000-6G for more information.

Response    : None.

Impact      : The enclosure may be getting inadequate power.  Subsequent loss
              of power supplies may force the enclosure to shutdown.

Action      : Replace the indicated power supply

I also verified that if the value of spoof_sensor_state is malformed, then an ereport is generated and a defect is diagnosed against the fmd module.

#2

Updated by Electric Monk over 2 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit ea30102ce458697473b0435bcdc7647dce2551f4

commit  ea30102ce458697473b0435bcdc7647dce2551f4
Author: Rob Johnston <rob.johnston@joyent.com>
Date:   2018-06-28T22:44:13.000Z

    9586 need to handle SP's that present multiple sensors with the same entity name
    9587 Add test mechanism to sensor-transport module for spoofing sensor states
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Igor Kozhukhov <igor@dilos.org>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Also available in: Atom PDF