Bug #1369
disable multi-threaded dump
10%
Description
Joyent, EC, and several others have reported hangs while taking crash dumps on multi-core machines.
Experimentally, this appears to be due to a bug in the threaded dump support.
We should seriously consider disabling threaded dump until someone has a chance to fix it
Workaround:
* * Don't use multi-threaded fast crash dump or a high compression level * set dump_plat_mincpu=0 set dump_bzip2_level=1
From Joyent's overlay. I'm unsure on whether the bzip2 limit is related.
I don't know of any progress on debugging this, due to the inherent difficulty in doing so.
Related issues
Updated by Rich Lowe over 9 years ago
Honestly, I'm much more inclined to actually remove multi-threaded dump (and likely bzip2 compression, since to be practical in time it requires multiple CPUs compressing). They're bad ideas, even if this specific bug is fixed.
Updated by Garrett D'Amore over 9 years ago
I think multithreading dump (especially since this is usually done at panic time) is a bad idea. When bad stuff happens, you want to get the data, and then get back into service reliably. You can't trust anything about the system during panic, and running MT seems like a bad idea with lots of potential for grief. (For the record, this is also why I'm opposed to using fast-reboot-on-panic... You want a real POST after a panic.)
Updated by Piotr Jasiukajtis over 9 years ago
I also disable fast reboot on panic since there were times system just stuck on boot after panic.
Updated by Yuri Pankov over 9 years ago
Very nice, the workaround solved my issues with not being able to correctly save the dump. Can we get the real fix (i.e. removal of multithreaded dump) in?
Updated by Albert Lee almost 9 years ago
Changing dump_bzip2_level has no effect because we don't use bzip2 in serial mode at all, because it's so expensive.
Updated by Josef Sipek almost 9 years ago
FWIW, I am working on ripping out MT dump. It's taking a bit longer than I'd like it to, but it's getting there.