Project

General

Profile

Bug #11584

::xcall would be useful

Added by John Levon 28 days ago. Updated 13 days ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

http://smartos.org/bugview/OS-7124

Many times recently, we've had a crash dump where all the CPUs get stuck. Often this is most easily visible as xcall state, where a master CPU is stuck in xc_serv() waiting for one or more slave CPUs to respond.

It's somewhat painful to observe xcall state, so to make state clearer, an ::xcall dcmd would be useful. This will collate all active xc_msg structs under the relevant master CPU, making it clear for each master what other CPUs it is waiting for and why.

Currently, a CPU that's processing a message takes it off the >xc_msgbox queue, and it's hence not (easily) visible in the dump state. To make this clearer, when we start to process a message, we'll place it in an >xc_curmsg holding cell, so it's easy for ::xcall to find it.

We also need a couple of mdb_ctf.c fixed: first, re-introduce the logic that allowed a consumer to optionally ignore missing members in mdb_ctf_vread() - so we fall back if a dump is missing ->xc_curmsg. Also, don't use UM_GC in that same routine: in a loop, this can easily exhaust KMDB's available space.

History

#1

Updated by Electric Monk 13 days ago

  • % Done changed from 0 to 100
  • Status changed from New to Closed

git commit a8ea0c9dd566453d9b69eab5f863930da9d0c4ae

commit  a8ea0c9dd566453d9b69eab5f863930da9d0c4ae
Author: John Levon <john.levon@joyent.com>
Date:   2019-09-04T09:22:58.000Z

    11584 ::xcall would be useful
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Also available in: Atom PDF