Project

General

Profile

Bug #1369

disable multi-threaded dump

Added by Rich Lowe about 8 years ago. Updated almost 6 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
kernel
Start date:
2011-08-12
Due date:
% Done:

10%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Joyent, EC, and several others have reported hangs while taking crash dumps on multi-core machines.
Experimentally, this appears to be due to a bug in the threaded dump support.

We should seriously consider disabling threaded dump until someone has a chance to fix it

Workaround:

*
* Don't use multi-threaded fast crash dump or a high compression level
*
set dump_plat_mincpu=0
set dump_bzip2_level=1

From Joyent's overlay. I'm unsure on whether the bzip2 limit is related.

I don't know of any progress on debugging this, due to the inherent difficulty in doing so.


Related issues

Related to illumos gate - Bug #1110: oi_148: panic dump not savedFeedback2011-06-14

Actions
Related to illumos gate - Bug #3314: disable multi-threaded dump until it can be removedResolved2012-10-26

Actions

History

#1

Updated by Rich Lowe almost 8 years ago

Honestly, I'm much more inclined to actually remove multi-threaded dump (and likely bzip2 compression, since to be practical in time it requires multiple CPUs compressing). They're bad ideas, even if this specific bug is fixed.

#2

Updated by Garrett D'Amore almost 8 years ago

I think multithreading dump (especially since this is usually done at panic time) is a bad idea. When bad stuff happens, you want to get the data, and then get back into service reliably. You can't trust anything about the system during panic, and running MT seems like a bad idea with lots of potential for grief. (For the record, this is also why I'm opposed to using fast-reboot-on-panic... You want a real POST after a panic.)

#3

Updated by Piotr Jasiukajtis almost 8 years ago

I also disable fast reboot on panic since there were times system just stuck on boot after panic.

#4

Updated by Yuri Pankov almost 8 years ago

Very nice, the workaround solved my issues with not being able to correctly save the dump. Can we get the real fix (i.e. removal of multithreaded dump) in?

#5

Updated by Albert Lee over 7 years ago

Changing dump_bzip2_level has no effect because we don't use bzip2 in serial mode at all, because it's so expensive.

#6

Updated by Josef Sipek over 7 years ago

FWIW, I am working on ripping out MT dump. It's taking a bit longer than I'd like it to, but it's getting there.

#7

Updated by Josef Sipek over 7 years ago

  • Assignee set to Josef Sipek
#8

Updated by Josef Sipek almost 6 years ago

  • % Done changed from 0 to 10

Also available in: Atom PDF