Project

General

Profile

Bug #2609

FMA's "fmd" gets events with a NULL pointer in 'em

Added by Dan McDonald over 8 years ago. Updated over 8 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2012-04-10
Due date:
% Done:

0%

Estimated time:
Difficulty:
Expert
Tags:
needs-triage
Gerrit CR:

Description

Pardon the lengthy explanation. I'm recalling this from memory and the flaky hardware in question is not attached right now.

    - SOMETHING (a 10GigE board in my case) can't attach to its driver, or even gets
      "retired".

    - An event is generated, and it eventually arrives at fmd's
          fme_undiagnosable().

    - The "struct fme" contains a list of "observations", which are
          "struct event".  These observations are *SUPPOSED* to contain an
          fmd_event_t "ffep".  The problem is, these observations have their
          ffep set to NULL!!!

    - This eventually leads to a NULL pointer dereference deeper down the
          stack:  fme_undiagnosable -> fmd_case_add_ereport ->
          fmd_case_insert_event -> fmd_event_hold -> BOOM.

    - One other bit of the fme.c code has a check for a NULL ffep.  I
          tried replicating this check in fme_undiagnosable(), but the panic
          happens later, and in an area where the data structure is so
          corrupt, I can only assume checking for NULL ffep wasn't a good
          idea.

    - I know little to nothing about FMA and fmd.  But a very flaky
          system (it's possible there's a BIOS bug that may cause the 10GigE
          board to not to be recognized, though prtconf showed a 10GigE
          board) may generate events that instead of causing faults, cause FMA
          crashes.  That can't be good.

Also available in: Atom PDF