zfs panic when sending replication stream
|Category:||zfs - Zettabyte File System|
I've had a 'zfs send' problem for a long time (OpenSolaris 2009.06) and last week I stupidly decided to scrub the pool, which resulted in an endless reboot and panic cycle.
I'm now trying to transfer the filesystems using oi_151a7, but it turns out the same thing happens with that kernel too.
I took a screenshot while the dump was written, showing part of the stack trace, which is attached. Let me know what else I can do to help.
I'd prefer not to restore from backup, since I'll lose all snapshots, but the system has been offline for way too many days already.
#2 Updated by Martin Blom over 5 years ago
- File zfs-send-panic-2012-10-28.mov added
It's not easy to catch what's going on, and 'savecore' does not seem to work:
root@elizabeth:/home/lcs# savecore savecore: bad magic number 0
I've attached a video ... If you look carefully, you can see that it seems to start with
panic[cpu3]/thread=ffffff000f80ac40: BAD TRAP: type=0 (#de Divide error) rp=ffff...
#3 Updated by Christopher Siden over 5 years ago
I can't seem to get your video to display the stack trace in a readable way (it's blurry). The panic message only contains a generic message, so without a stack trace or the ability to reproduce the problem it's pretty much impossible to fix. Do you not have a scrollback buffer you can use to copy-paste the full output of the panic message + stack trace?
When you say savecore isn't working do you mean the panic isn't generating a crash dump (i.e. there is nothing in /var/crash?).
#4 Updated by Martin Blom over 5 years ago
I understand that you need more information, and I would be happy to provide it, but I need some help in order to do so.
The console video is from a KVM that sampling the VGA output, so copy/paste does not work, and the message/stack trace just flashes by and disappears, as the video indicates. I have only a few seconds available when the kernel is dumping to disk before it reboots. I didn't think the Solaris console had scrollback but I'll try page up/down next time.
Would attaching something to the serial port help?
Yes, I don't even have /var/crash and savecore just exists with that "bad magic number 0" message. Don't know what to do.
#5 Updated by Christopher Siden over 5 years ago
Maybe the core dump problem is related to:
You could try putting that workaround line in /etc/system (and bootadm update-archive and reboot), all it does is disable multi-threaded crash dumps (which have a bug in them).
Otherwise you can add 'console=ttya' to your grub kernel line, so it looks something like this:
kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS,console=ttya
That will redirect the system console to a serial device, you will no longer see anything in your KVM after grub passes things off to the kernel, but you will be able to view the console (including the panic message and stack trace) from your serial console, were presumably you could get a scroll back buffer.
Also available in: Atom