Project

General

Profile

Bug #5208

Edge case in mpt_sas causing panic

Added by Rich Ercolani about 5 years ago. Updated almost 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
driver - device drivers
Start date:
2014-10-03
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

il-gate 20140921, misbehaving hard drive (may or may not be related; drive was gone from the bus after panic-induced reboot), panic.

Core will be available at http://skysrv.pha.jhu.edu/~rercola/cores/mptsas_20141003 (10GB compressed, 30GB uncompressed) once it finishes uploading.

Standard mdb inquiry output attached.


Files

kmastat.txt (37.9 KB) kmastat.txt ::kmastat Rich Ercolani, 2014-10-03 06:28 PM
mptsas.txt (210 Bytes) mptsas.txt ::mptsas Rich Ercolani, 2014-10-03 06:28 PM
msgbuf.txt (11.1 KB) msgbuf.txt ::msgbuf Rich Ercolani, 2014-10-03 06:28 PM
panicinfo.txt (1.43 KB) panicinfo.txt ::panicinfo Rich Ercolani, 2014-10-03 06:28 PM
stack.txt (254 Bytes) stack.txt ::stack Rich Ercolani, 2014-10-03 06:28 PM
stacks.txt (37 KB) stacks.txt ::stacks Rich Ercolani, 2014-10-03 06:28 PM

Related issues

Related to illumos gate - Bug #5538: Kernel panic on SAS disk failureNew2015-01-14

Actions

History

#1

Updated by Jorge Schrauwen almost 5 years ago

I've also just (monday early morning) been hit by this.

hardware
----------------
LSI SAS 9201-16i
BIOS version : 7.35.00.00
Firmware version : 18.00.00.00
SuperMicro SC846TQ-R1200B (rev. M) chassis (SF-forgotthenumber to 4 sata fan outs)
(I know, not the best of things to do but it's only a home server)

uname a
---------------

SunOS monolith.acheron.be 5.11 omnios-10b9c79 i86pc i386 i86pc

(that is OmniOS current stable)

fmdump
----------------
TIME UUID SUNW-MSG-ID
Jan 12 2015 02:32:26.370032000 a3f3d99e-aafb-6da0-8d01-cb5f86a067d1 SUNOS-8000-KL

TIME CLASS ENA
Jan 12 02:32:26.3407 ireport.os.sunos.panic.dump_available 0x0000000000000000
Jan 12 02:31:45.0462 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000

nvlist version: 0
version = 0x0
class = list.suspect
uuid = a3f3d99e-aafb-6da0-8d01-cb5f86a067d1
code = SUNOS-8000-KL
diag-time = 1421026346 342282
de = fmd:///module/software-diagnosis
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list0)
nvlist version: 0
version = 0x0
class = defect.sunos.kernel.panic
certainty = 0x64
asru = sw:///:path=/var/crash/unknown/.a3f3d99e-aafb-6da0-8d01-cb5f86a067d1
resource = sw:///:path=/var/crash/unknown/.a3f3d99e-aafb-6da0-8d01-cb5f86a067d1
savecore-succcess = 1
dump-dir = /var/crash/unknown
dump-files = vmdump.3
os-instance-uuid = a3f3d99e-aafb-6da0-8d01-cb5f86a067d1
panicstr = BAD TRAP: type=d (#gp General protection) rp=ffffff00f5a3d930 addr=ffffff21f2bcb036
panicstack = unix:real_mode_stop_cpu_stage2_end+9e23 () | unix:trap+a30 () | unix:cmntrap+e6 () | mpt_sas:mptsas_watchsubr+111 () | mpt_sas:mptsas_watch+96 () | genunix:callout_list_expire+98 () | genunix:callout_expire+3b () | genunix:callout_execute+20 () | genunix:taskq_thread+2d0 () | unix:thread_start+8 () |
crashtime = 1421025252
panic-time = Mon Jan 12 02:14:12 2015 CET
(end fault-list0)

fault-status = 0x1
severity = Major
_ttl = 0x1
_tod = 0x54b3242a 0x160e3d80
--------

One disk is also missing, nowhere near the box so not sure if the disk is dead or not.

#2

Updated by Simon Klinkert almost 5 years ago

I think I may have hit the same bug (multiple times in the past). The following pastebin shows the kernel panic caused by one dying disk: http://pastebin.com/dVnNVJcy
I can provide further information if needed. The mpt_sas bugs are really annoying...

#3

Updated by Hans Rosenfeld almost 5 years ago

Simon Klinkert wrote:

I think I may have hit the same bug (multiple times in the past). The following pastebin shows the kernel panic caused by one dying disk: http://pastebin.com/dVnNVJcy
I can provide further information if needed. The mpt_sas bugs are really annoying...

Are you running the latest illumos-gate bits? What does uname -v say?

#4

Updated by Simon Klinkert almost 5 years ago

Hans Rosenfeld wrote:

Simon Klinkert wrote:

I think I may have hit the same bug (multiple times in the past). The following pastebin shows the kernel panic caused by one dying disk: http://pastebin.com/dVnNVJcy
I can provide further information if needed. The mpt_sas bugs are really annoying...

Are you running the latest illumos-gate bits? What does uname -v say?

Nope, the Illumos on this machine is a bit older. It's from Jun 2014.

#5

Updated by Marcel Telka almost 3 years ago

  • Related to Bug #5538: Kernel panic on SAS disk failure added
#6

Updated by Marcel Telka almost 3 years ago

  • Category set to driver - device drivers

Also available in: Atom PDF