Project

General

Profile

Feature #7058

crash dump progress should be emitted more frequently

Added by Robert Mustacchi over 4 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Category:
kernel
Start date:
2016-06-06
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

If the system panics, a crash dump is (hopefully) written to disk. Systems with a large quantity of DRAM can take a considerable period of time to dump to disk, during which the operator may not interact with the machine to determine its status.

If the system is configured to use the VGA console, when the operator views that console via a monitor or some remote KVM facility, the fact that the system is writing a crash dump is easy to see. A progress update is drawn to the screen each time one percent of progress is made toward completion.

For systems with a large quantity of memory, it can take many seconds or even minutes to write even one percent of the dump: one percent of 200GB is 2GB, and polled I/O is not especially fast. The operator would be forgiven for thinking that the passage of a minute or two without a progress update could signal that the system has hung while trying to dump.

Ironically, the use of the generally superior IPMI SOL support in the BMC to attach to the serial console of the machine actually further obscures dump activity. The operator can generally only see text emitted after attaching; infrequent updates mean that the operator may see nothing for minutes.

In order to alleviate these issues, we should take a two-pronged approach:

  • liveliness will be signalled to the operator by drawing the progress display not just when the completion percentage changes, but also if a whole second has passed without an update
  • operators with the good fortune to be able to use the serial console (and the impeccable taste to choose to do so) will know that the system is dumping because we will prepend dumping: to each redraw of the progress report
#1

Updated by Joshua M. Clulow over 4 years ago

  • Description updated (diff)
#2

Updated by Electric Monk over 4 years ago

  • Status changed from New to Closed

git commit 2fb0949cb8f76f6cc30b9fef03f032f1893bbc6c

commit  2fb0949cb8f76f6cc30b9fef03f032f1893bbc6c
Author: Joshua M. Clulow <jmc@joyent.com>
Date:   2016-06-09T22:32:41.000Z

    7058 crash dump progress should be emitted more frequently
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
    Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
    Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
    Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Also available in: Atom PDF