Bug #5208
closedEdge case in mpt_sas causing panic
0%
Description
il-gate 20140921, misbehaving hard drive (may or may not be related; drive was gone from the bus after panic-induced reboot), panic.
Core will be available at http://skysrv.pha.jhu.edu/~rercola/cores/mptsas_20141003 (10GB compressed, 30GB uncompressed) once it finishes uploading.
Standard mdb inquiry output attached.
Files
Related issues
Updated by Jorge Schrauwen almost 9 years ago
I've also just (monday early morning) been hit by this.
hardware
----------------
LSI SAS 9201-16i
BIOS version : 7.35.00.00
Firmware version : 18.00.00.00
SuperMicro SC846TQ-R1200B (rev. M) chassis (SF-forgotthenumber to 4 sata fan outs)
(I know, not the best of things to do but it's only a home server)
uname a
---------------
SunOS monolith.acheron.be 5.11 omnios-10b9c79 i86pc i386 i86pc
(that is OmniOS current stable)
fmdump
----------------
TIME UUID SUNW-MSG-ID
Jan 12 2015 02:32:26.370032000 a3f3d99e-aafb-6da0-8d01-cb5f86a067d1 SUNOS-8000-KL
TIME CLASS ENA
Jan 12 02:32:26.3407 ireport.os.sunos.panic.dump_available 0x0000000000000000
Jan 12 02:31:45.0462 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000
nvlist version: 0
version = 0x0
class = list.suspect
uuid = a3f3d99e-aafb-6da0-8d01-cb5f86a067d1
code = SUNOS-8000-KL
diag-time = 1421026346 342282
de = fmd:///module/software-diagnosis
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list0)
nvlist version: 0
version = 0x0
class = defect.sunos.kernel.panic
certainty = 0x64
asru = sw:///:path=/var/crash/unknown/.a3f3d99e-aafb-6da0-8d01-cb5f86a067d1
resource = sw:///:path=/var/crash/unknown/.a3f3d99e-aafb-6da0-8d01-cb5f86a067d1
savecore-succcess = 1
dump-dir = /var/crash/unknown
dump-files = vmdump.3
os-instance-uuid = a3f3d99e-aafb-6da0-8d01-cb5f86a067d1
panicstr = BAD TRAP: type=d (#gp General protection) rp=ffffff00f5a3d930 addr=ffffff21f2bcb036
panicstack = unix:real_mode_stop_cpu_stage2_end+9e23 () | unix:trap+a30 () | unix:cmntrap+e6 () | mpt_sas:mptsas_watchsubr+111 () | mpt_sas:mptsas_watch+96 () | genunix:callout_list_expire+98 () | genunix:callout_expire+3b () | genunix:callout_execute+20 () | genunix:taskq_thread+2d0 () | unix:thread_start+8 () |
crashtime = 1421025252
panic-time = Mon Jan 12 02:14:12 2015 CET
(end fault-list0)
fault-status = 0x1
severity = Major
_ttl = 0x1
_tod = 0x54b3242a 0x160e3d80
--------
One disk is also missing, nowhere near the box so not sure if the disk is dead or not.
Updated by Simon K over 8 years ago
I think I may have hit the same bug (multiple times in the past). The following pastebin shows the kernel panic caused by one dying disk: http://pastebin.com/dVnNVJcy
I can provide further information if needed. The mpt_sas bugs are really annoying...
Updated by Hans Rosenfeld over 8 years ago
Simon Klinkert wrote:
I think I may have hit the same bug (multiple times in the past). The following pastebin shows the kernel panic caused by one dying disk: http://pastebin.com/dVnNVJcy
I can provide further information if needed. The mpt_sas bugs are really annoying...
Are you running the latest illumos-gate bits? What does uname -v say?
Updated by Simon K over 8 years ago
Hans Rosenfeld wrote:
Simon Klinkert wrote:
I think I may have hit the same bug (multiple times in the past). The following pastebin shows the kernel panic caused by one dying disk: http://pastebin.com/dVnNVJcy
I can provide further information if needed. The mpt_sas bugs are really annoying...Are you running the latest illumos-gate bits? What does uname -v say?
Nope, the Illumos on this machine is a bit older. It's from Jun 2014.
Updated by Marcel Telka almost 7 years ago
- Related to Bug #5538: Kernel panic on SAS disk failure added
Updated by Marcel Telka almost 7 years ago
- Category set to driver - device drivers
Updated by Marcel Telka almost 2 years ago
- Has duplicate Bug #5538: Kernel panic on SAS disk failure added
Updated by Marcel Telka almost 2 years ago
- Related to deleted (Bug #5538: Kernel panic on SAS disk failure)
Updated by Marcel Telka almost 2 years ago
- Status changed from New to Feedback
- Assignee set to Marcel Telka
I have 10+ crashdumps with this issue from 2014/2015 (illumos commit 511ef1d6e7 from June 2014), but zero similar crashdumps once we resynced with illumos commit 03b1c2971d (April 2015). So, it looks like this issue was fixed long time ago. Possible candidates that touched mpt_sas sources between June 2014 and April 2015 are: #4546, #4960, #5015, #5016, #5017, #5080, #5081, #5297.
Is there any recent panic around with this stack?
Updated by Predrag Zečević 9 months ago
Hi all,
does this looks similar?
:; fmdump -Vp -u 0292fd6d-abd7-e2d3-adef-95d2427d6a58 TIME UUID SUNW-MSG-ID Mar 02 2023 22:30:51.034649000 0292fd6d-abd7-e2d3-adef-95d2427d6a58 SUNOS-8000-KL TIME CLASS ENA Mar 02 22:30:51.0000 ireport.os.sunos.panic.dump_available 0x0000000000000000 Mar 02 22:30:45.0397 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000 nvlist version: 0 version = 0x0 class = list.suspect uuid = 0292fd6d-abd7-e2d3-adef-95d2427d6a58 code = SUNOS-8000-KL diag-time = 1677792651 11754 de = fmd:///module/software-diagnosis fault-list-sz = 0x1 fault-list = (array of embedded nvlists) (start fault-list[0]) nvlist version: 0 version = 0x0 class = defect.sunos.kernel.panic certainty = 0x64 asru = sw:///:path=/var/cores/.0292fd6d-abd7-e2d3-adef-95d2427d6a58 resource = sw:///:path=/var/cores/.0292fd6d-abd7-e2d3-adef-95d2427d6a58 savecore-succcess = 1 dump-dir = /var/cores dump-files = vmdump.1 os-instance-uuid = 0292fd6d-abd7-e2d3-adef-95d2427d6a58 panicstr = BAD TRAP: type=d (#gp General protection) rp=fffffe00212d8720 addr=fffffe1705474380 panicstack = unix:real_mode_stop_cpu_stage2_end+bb3d () | unix:trap+bf2 () | unix:cmntrap+e9 () | genunix:list_insert_tail+e () | zfs:dbuf_dirty+4a4 () | zfs:dmu_buf_will_dirty_impl+cf () | zfs:dmu_buf_will_dirty+13 () | zfs:sa_attr_op+2e7 () | zfs:sa_bulk_update_impl+6d () | zfs:sa_bulk_update+4d () | zfs:zfs_link_destroy+20c () | zfs:zfs_remove+582 () | genunix:fop_remove+5d () | genunix:vn_removeat+372 () | genunix:unlinkat+56 () | unix:brand_sys_syscall+1fe () | crashtime = 1677792593 panic-time = March 2, 2023 at 10:29:53 PM CET CET (end fault-list[0]) fault-status = 0x1 severity = Major __ttl = 0x1 __tod = 0x6401158b 0x210b3a8
PC:
Manufacturer: ASUSTeK COMPUTER INC. Product Name: P8H77-V LE Version: Rev X.0x
:; prtdiag System Configuration: System manufacturer System Product Name BIOS Configuration: American Megatrends Inc. 1301 08/20/2013 ==== Processor Sockets ==================================== Version Location Tag -------------------------------- -------------------------- Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz LGA1155 ==== Memory Device Sockets ================================ Type Status Set Device Locator Bank Locator ----------- ------ --- ------------------- ---------------- unknown empty 0 ChannelA-DIMM0 BANK 0 unknown empty 0 ChannelA-DIMM1 BANK 1 DDR3 in use 0 ChannelB-DIMM0 BANK 2 DDR3 in use 0 ChannelB-DIMM1 BANK 3 ==== On-Board Devices ===================================== Onboard Ethernet ==== Upgradeable Slots ==================================== ID Status Type Description --- --------- ---------------- ---------------------------- 1 available PCI Express PCIEX16_1 2 in use PCI Express PCIEX16_2 3 available PCI Express PCIEX1_1 4 available PCI Express PCIEX1_2 5 available PCI PCI1 6 available PCI PCI2 7 available PCI PCI3
Version
:; uname -rosv SunOS 5.11 illumos-43a6dad6f6 illumos
almost latest...
Do you need more info?
Updated by Marcel Telka 7 months ago
- Status changed from Feedback to Closed
Predrag Zečević wrote in #note-10:
does this looks similar?
Sorry, no. The stack looks completely different. There is no mpt_sas there at all.