Project

General

Profile

Actions

Bug #8946

closed

fmdump(1m) should be more resilient in the face of missing message content

Added by Rob Johnston almost 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
2018-01-02
Due date:
% Done:

100%

Estimated time:
Difficulty:
Bite-size
Tags:
Gerrit CR:
External Bug:

Description

fmdump(1m) implements a "-m" option which will dump the localized message content associated with each event in the fault logs (/var/fm/fmd/fltlog*).

Currently, if fmdump fails to lookup the message content for an event it exits immediately with an error. This is an undesirable behavior for a couple reasons:

1) The current error doesn't even identify the affected event or the diagcode that it failed to lookup the msg content for.

2) In such a case, it would be far more useful to at least dump as much as it can successfully rather than bailing out on the first error.

Given that fmd supports proxying events from other fault managers, potentially from other machines (or OS instances) the possibility of ending up with an FM event for which there is no associated message content on the local machine is not entirely inconceivable, so we should handle it better.

This CR is to change fmdump such that if it fails to lookup the message content for an event, rather than exiting, it will print an error message indicating that it couldn't lookup the content for said event (identifying the event by its uuid) and then forge on.

Note: This change was integrated into illumos-joyent via the following changeset and there have been no issues with it.

commit 58853b2326b7e3a605e4e558d0af3e028c87f434
Author: Rob Johnston <rob.johnston@joyent.com>
Date:   Mon Nov 20 18:10:21 2017 +0000

    OS-6448 fmdump(1m) should be more resilient in the face of missing message content
    Reviewed by: Robert Mustacchi <robert.mustacchi@joyent.com>
    Approved by: Robert Mustacchi <robert.mustacchi@joyent.com>

For more details, see the corresponding Joyent bug:

https://smartos.org/bugview/OS-6448

Actions #1

Updated by Rob Johnston over 5 years ago

See the illumos-joyent bug report for how the fix was originally tested.

After porting the fix upstream to illumos-joyent, I re-tested the change on my open indiana workstation. That output is captured below:

#
# Here's an fmd fltlog with a dozen or so entries
#
root@openindiana:/# fmdump
TIME                 UUID                                 SUNW-MSG-ID EVENT
Nov 10 15:17:32.2011 8fc075a4-4ae1-c5c8-9bd5-d6e3540bca98 SUNOS-8000-J0 Diagnosed
Nov 10 16:14:37.0111 ba3254a3-e01b-4a89-e5b3-a30c6ab7852b SMF-8000-YX Diagnosed
Nov 10 16:16:29.1287 ba3254a3-e01b-4a89-e5b3-a30c6ab7852b FMD-8000-4M Repaired
Nov 10 16:16:29.1513 ba3254a3-e01b-4a89-e5b3-a30c6ab7852b FMD-8000-6U Resolved
Nov 13 06:54:41.2248 a7ea3e63-58c8-6f53-8bf6-8915308e9d62 SMF-8000-YX Diagnosed
Nov 18 13:44:37.8118 a7ea3e63-58c8-6f53-8bf6-8915308e9d62 FMD-8000-4M Repaired
Nov 18 13:44:37.8312 a7ea3e63-58c8-6f53-8bf6-8915308e9d62 FMD-8000-6U Resolved
Nov 18 13:44:39.0666 c7933e94-a212-42aa-ccd2-ad143b51bbe2 SMF-8000-YX Diagnosed
Nov 22 00:13:28.6937 c7933e94-a212-42aa-ccd2-ad143b51bbe2 FMD-8000-4M Repaired
Nov 22 00:13:28.7013 c7933e94-a212-42aa-ccd2-ad143b51bbe2 FMD-8000-6U Resolved
Dec 07 06:56:01.1395 db05f2d8-9802-e581-85ff-a48a034c004a SMF-8000-YX Diagnosed
Dec 07 07:01:09.4506 db05f2d8-9802-e581-85ff-a48a034c004a FMD-8000-4M Repaired
Dec 07 07:01:09.4521 db05f2d8-9802-e581-85ff-a48a034c004a FMD-8000-6U Resolved
Dec 07 07:02:09.4651 fedfc2e2-336b-64b8-8ec1-dceffbd4ca8e SMF-8000-YX Diagnosed
Dec 07 07:05:59.6805 fedfc2e2-336b-64b8-8ec1-dceffbd4ca8e FMD-8000-4M Repaired
Dec 07 07:05:59.6904 fedfc2e2-336b-64b8-8ec1-dceffbd4ca8e FMD-8000-6U Resolved

#
# Move a couple of the message files aside so that some of the lookups will fail
#
root@openindiana:/# mv /usr/lib/locale/C/LC_MESSAGES/SMF.mo /var/tmp
root@openindiana:/# mv /usr/lib/locale/C/LC_MESSAGES/SUNOS.mo /var/tmp

#
# First here's the unhelpful error you get from the stock fmdump binary:
#
root@openindiana:/# fmdump -m
fmdump: failed to format message: No such file or directory
fmdump: warning: failed to dump /var/fm/fmd/fltlog: Error 0
root@openindiana:/# echo $?
3

#
# And here's output running the fixed fmdump binary out the proto area of my workspace
#
root@openindiana:/# ~rejohnst/ws/illumos-upstream1/proto/root_i386/usr/sbin/fmdump -m
/export/home/rejohnst/ws/illumos-upstream1/proto/root_i386/usr/sbin/fmdump: failed to format message for diagcode SUNOS-8000-J0, event 8fc075a4-4ae1-c5c8-9bd5-d6e3540bca98: No such file or directory

/export/home/rejohnst/ws/illumos-upstream1/proto/root_i386/usr/sbin/fmdump: failed to format message for diagcode SMF-8000-YX, event ba3254a3-e01b-4a89-e5b3-a30c6ab7852b: No such file or directory

SUNW-MSG-ID: FMD-8000-4M, TYPE: Repair, VER: 1, SEVERITY: Minor
EVENT-TIME: Fri Nov 10 16:14:36 PST 2017
PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: openindiana
SOURCE: software-diagnosis, REV: 0.1
EVENT-ID: ba3254a3-e01b-4a89-e5b3-a30c6ab7852b
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-4M for more information.
AUTO-RESPONSE: Some system components offlined because of the original fault may have been brought back online.
IMPACT: Performance degradation of the system due to the original fault may have been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved, VER: 1, SEVERITY: Minor
EVENT-TIME: Fri Nov 10 16:14:36 PST 2017
PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: openindiana
SOURCE: software-diagnosis, REV: 0.1
EVENT-ID: ba3254a3-e01b-4a89-e5b3-a30c6ab7852b
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-6U for more information.
AUTO-RESPONSE: All system components offlined because of the original fault have been brought back online.
IMPACT: Performance degradation of the system due to the original fault has been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

/export/home/rejohnst/ws/illumos-upstream1/proto/root_i386/usr/sbin/fmdump: failed to format message for diagcode SMF-8000-YX, event a7ea3e63-58c8-6f53-8bf6-8915308e9d62: No such file or directory

SUNW-MSG-ID: FMD-8000-4M, TYPE: Repair, VER: 1, SEVERITY: Minor
EVENT-TIME: Mon Nov 13 06:54:41 PST 2017
PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: openindiana
SOURCE: software-diagnosis, REV: 0.1
EVENT-ID: a7ea3e63-58c8-6f53-8bf6-8915308e9d62
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-4M for more information.
AUTO-RESPONSE: Some system components offlined because of the original fault may have been brought back online.
IMPACT: Performance degradation of the system due to the original fault may have been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved, VER: 1, SEVERITY: Minor
EVENT-TIME: Mon Nov 13 06:54:41 PST 2017
PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: openindiana
SOURCE: software-diagnosis, REV: 0.1
EVENT-ID: a7ea3e63-58c8-6f53-8bf6-8915308e9d62
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-6U for more information.
AUTO-RESPONSE: All system components offlined because of the original fault have been brought back online.
IMPACT: Performance degradation of the system due to the original fault has been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

/export/home/rejohnst/ws/illumos-upstream1/proto/root_i386/usr/sbin/fmdump: failed to format message for diagcode SMF-8000-YX, event c7933e94-a212-42aa-ccd2-ad143b51bbe2: No such file or directory

SUNW-MSG-ID: FMD-8000-4M, TYPE: Repair, VER: 1, SEVERITY: Minor
EVENT-TIME: Sat Nov 18 13:44:39 PST 2017
PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: openindiana
SOURCE: software-diagnosis, REV: 0.1
EVENT-ID: c7933e94-a212-42aa-ccd2-ad143b51bbe2
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-4M for more information.
AUTO-RESPONSE: Some system components offlined because of the original fault may have been brought back online.
IMPACT: Performance degradation of the system due to the original fault may have been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved, VER: 1, SEVERITY: Minor
EVENT-TIME: Sat Nov 18 13:44:39 PST 2017
PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: openindiana
SOURCE: software-diagnosis, REV: 0.1
EVENT-ID: c7933e94-a212-42aa-ccd2-ad143b51bbe2
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-6U for more information.
AUTO-RESPONSE: All system components offlined because of the original fault have been brought back online.
IMPACT: Performance degradation of the system due to the original fault has been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

/export/home/rejohnst/ws/illumos-upstream1/proto/root_i386/usr/sbin/fmdump: failed to format message for diagcode SMF-8000-YX, event db05f2d8-9802-e581-85ff-a48a034c004a: No such file or directory

SUNW-MSG-ID: FMD-8000-4M, TYPE: Repair, VER: 1, SEVERITY: Minor
EVENT-TIME: Thu Dec  7 06:56:01 PST 2017
PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: openindiana
SOURCE: fmd, REV: 1.2
EVENT-ID: db05f2d8-9802-e581-85ff-a48a034c004a
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-4M for more information.
AUTO-RESPONSE: Some system components offlined because of the original fault may have been brought back online.
IMPACT: Performance degradation of the system due to the original fault may have been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved, VER: 1, SEVERITY: Minor
EVENT-TIME: Thu Dec  7 06:56:01 PST 2017
PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: openindiana
SOURCE: fmd, REV: 1.2
EVENT-ID: db05f2d8-9802-e581-85ff-a48a034c004a
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-6U for more information.
AUTO-RESPONSE: All system components offlined because of the original fault have been brought back online.
IMPACT: Performance degradation of the system due to the original fault has been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

/export/home/rejohnst/ws/illumos-upstream1/proto/root_i386/usr/sbin/fmdump: failed to format message for diagcode SMF-8000-YX, event fedfc2e2-336b-64b8-8ec1-dceffbd4ca8e: No such file or directory

SUNW-MSG-ID: FMD-8000-4M, TYPE: Repair, VER: 1, SEVERITY: Minor
EVENT-TIME: Thu Dec  7 07:02:09 PST 2017
PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: openindiana
SOURCE: software-diagnosis, REV: 0.1
EVENT-ID: fedfc2e2-336b-64b8-8ec1-dceffbd4ca8e
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-4M for more information.
AUTO-RESPONSE: Some system components offlined because of the original fault may have been brought back online.
IMPACT: Performance degradation of the system due to the original fault may have been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved, VER: 1, SEVERITY: Minor
EVENT-TIME: Thu Dec  7 07:02:09 PST 2017
PLATFORM: System-Product-Name, CSN: System-Serial-Number, HOSTNAME: openindiana
SOURCE: software-diagnosis, REV: 0.1
EVENT-ID: fedfc2e2-336b-64b8-8ec1-dceffbd4ca8e
DESC: All faults associated with an event id have been addressed.
  Refer to http://illumos.org/msg/FMD-8000-6U for more information.
AUTO-RESPONSE: All system components offlined because of the original fault have been brought back online.
IMPACT: Performance degradation of the system due to the original fault has been recovered.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to identify the repaired components.

root@openindiana:/# echo $?
3

Actions #2

Updated by Electric Monk over 5 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 8c331166625387ef510c183eb42ba2dec1af7a0d

commit  8c331166625387ef510c183eb42ba2dec1af7a0d
Author: Rob Johnston <rob.johnston@joyent.com>
Date:   2018-01-05T00:27:42.000Z

    8946 fmdump(1m) should be more resilient in the face of missing message content
    Reviewed by: Robert Mustacchi <robert.mustacchi@joyent.com>
    Reviewed by: Andy Stormont <astormont@racktopsystems.com>
    Reviewed by: Igor Kozhukhov <igor@dilos.org>
    Approved by: Gordon Ross <gwr@nexenta.com>

Actions

Also available in: Atom PDF