Project

General

Profile

Actions

Bug #5208

open

Edge case in mpt_sas causing panic

Added by Rich Ercolani over 7 years ago. Updated 4 months ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
driver - device drivers
Start date:
2014-10-03
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

il-gate 20140921, misbehaving hard drive (may or may not be related; drive was gone from the bus after panic-induced reboot), panic.

Core will be available at http://skysrv.pha.jhu.edu/~rercola/cores/mptsas_20141003 (10GB compressed, 30GB uncompressed) once it finishes uploading.

Standard mdb inquiry output attached.


Files

kmastat.txt (37.9 KB) kmastat.txt ::kmastat Rich Ercolani, 2014-10-03 06:28 PM
mptsas.txt (210 Bytes) mptsas.txt ::mptsas Rich Ercolani, 2014-10-03 06:28 PM
msgbuf.txt (11.1 KB) msgbuf.txt ::msgbuf Rich Ercolani, 2014-10-03 06:28 PM
panicinfo.txt (1.43 KB) panicinfo.txt ::panicinfo Rich Ercolani, 2014-10-03 06:28 PM
stack.txt (254 Bytes) stack.txt ::stack Rich Ercolani, 2014-10-03 06:28 PM
stacks.txt (37 KB) stacks.txt ::stacks Rich Ercolani, 2014-10-03 06:28 PM

Related issues

Has duplicate illumos gate - Bug #5538: Kernel panic on SAS disk failureDuplicate2015-01-14

Actions
Actions #1

Updated by Jorge Schrauwen over 7 years ago

I've also just (monday early morning) been hit by this.

hardware
----------------
LSI SAS 9201-16i
BIOS version : 7.35.00.00
Firmware version : 18.00.00.00
SuperMicro SC846TQ-R1200B (rev. M) chassis (SF-forgotthenumber to 4 sata fan outs)
(I know, not the best of things to do but it's only a home server)

uname a
---------------

SunOS monolith.acheron.be 5.11 omnios-10b9c79 i86pc i386 i86pc

(that is OmniOS current stable)

fmdump
----------------
TIME UUID SUNW-MSG-ID
Jan 12 2015 02:32:26.370032000 a3f3d99e-aafb-6da0-8d01-cb5f86a067d1 SUNOS-8000-KL

TIME CLASS ENA
Jan 12 02:32:26.3407 ireport.os.sunos.panic.dump_available 0x0000000000000000
Jan 12 02:31:45.0462 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000

nvlist version: 0
version = 0x0
class = list.suspect
uuid = a3f3d99e-aafb-6da0-8d01-cb5f86a067d1
code = SUNOS-8000-KL
diag-time = 1421026346 342282
de = fmd:///module/software-diagnosis
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list0)
nvlist version: 0
version = 0x0
class = defect.sunos.kernel.panic
certainty = 0x64
asru = sw:///:path=/var/crash/unknown/.a3f3d99e-aafb-6da0-8d01-cb5f86a067d1
resource = sw:///:path=/var/crash/unknown/.a3f3d99e-aafb-6da0-8d01-cb5f86a067d1
savecore-succcess = 1
dump-dir = /var/crash/unknown
dump-files = vmdump.3
os-instance-uuid = a3f3d99e-aafb-6da0-8d01-cb5f86a067d1
panicstr = BAD TRAP: type=d (#gp General protection) rp=ffffff00f5a3d930 addr=ffffff21f2bcb036
panicstack = unix:real_mode_stop_cpu_stage2_end+9e23 () | unix:trap+a30 () | unix:cmntrap+e6 () | mpt_sas:mptsas_watchsubr+111 () | mpt_sas:mptsas_watch+96 () | genunix:callout_list_expire+98 () | genunix:callout_expire+3b () | genunix:callout_execute+20 () | genunix:taskq_thread+2d0 () | unix:thread_start+8 () |
crashtime = 1421025252
panic-time = Mon Jan 12 02:14:12 2015 CET
(end fault-list0)

fault-status = 0x1
severity = Major
_ttl = 0x1
_tod = 0x54b3242a 0x160e3d80
--------

One disk is also missing, nowhere near the box so not sure if the disk is dead or not.

Actions #2

Updated by Simon K about 7 years ago

I think I may have hit the same bug (multiple times in the past). The following pastebin shows the kernel panic caused by one dying disk: http://pastebin.com/dVnNVJcy
I can provide further information if needed. The mpt_sas bugs are really annoying...

Actions #3

Updated by Hans Rosenfeld about 7 years ago

Simon Klinkert wrote:

I think I may have hit the same bug (multiple times in the past). The following pastebin shows the kernel panic caused by one dying disk: http://pastebin.com/dVnNVJcy
I can provide further information if needed. The mpt_sas bugs are really annoying...

Are you running the latest illumos-gate bits? What does uname -v say?

Actions #4

Updated by Simon K about 7 years ago

Hans Rosenfeld wrote:

Simon Klinkert wrote:

I think I may have hit the same bug (multiple times in the past). The following pastebin shows the kernel panic caused by one dying disk: http://pastebin.com/dVnNVJcy
I can provide further information if needed. The mpt_sas bugs are really annoying...

Are you running the latest illumos-gate bits? What does uname -v say?

Nope, the Illumos on this machine is a bit older. It's from Jun 2014.

Actions #5

Updated by Marcel Telka over 5 years ago

  • Related to Bug #5538: Kernel panic on SAS disk failure added
Actions #6

Updated by Marcel Telka over 5 years ago

  • Category set to driver - device drivers
Actions #7

Updated by Marcel Telka 4 months ago

  • Has duplicate Bug #5538: Kernel panic on SAS disk failure added
Actions #8

Updated by Marcel Telka 4 months ago

  • Related to deleted (Bug #5538: Kernel panic on SAS disk failure)
Actions #9

Updated by Marcel Telka 4 months ago

  • Status changed from New to Feedback
  • Assignee set to Marcel Telka

I have 10+ crashdumps with this issue from 2014/2015 (illumos commit 511ef1d6e7 from June 2014), but zero similar crashdumps once we resynced with illumos commit 03b1c2971d (April 2015). So, it looks like this issue was fixed long time ago. Possible candidates that touched mpt_sas sources between June 2014 and April 2015 are: #4546, #4960, #5015, #5016, #5017, #5080, #5081, #5297.

Is there any recent panic around with this stack?

Actions

Also available in: Atom PDF