Project

General

Profile

Actions

Bug #5208

closed

Edge case in mpt_sas causing panic

Added by Rich Ercolani about 9 years ago. Updated 7 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
driver - device drivers
Start date:
2014-10-03
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:
External Bug:

Description

il-gate 20140921, misbehaving hard drive (may or may not be related; drive was gone from the bus after panic-induced reboot), panic.

Core will be available at http://skysrv.pha.jhu.edu/~rercola/cores/mptsas_20141003 (10GB compressed, 30GB uncompressed) once it finishes uploading.

Standard mdb inquiry output attached.


Files

kmastat.txt (37.9 KB) kmastat.txt ::kmastat Rich Ercolani, 2014-10-03 06:28 PM
mptsas.txt (210 Bytes) mptsas.txt ::mptsas Rich Ercolani, 2014-10-03 06:28 PM
msgbuf.txt (11.1 KB) msgbuf.txt ::msgbuf Rich Ercolani, 2014-10-03 06:28 PM
panicinfo.txt (1.43 KB) panicinfo.txt ::panicinfo Rich Ercolani, 2014-10-03 06:28 PM
stack.txt (254 Bytes) stack.txt ::stack Rich Ercolani, 2014-10-03 06:28 PM
stacks.txt (37 KB) stacks.txt ::stacks Rich Ercolani, 2014-10-03 06:28 PM

Related issues

Has duplicate illumos gate - Bug #5538: Kernel panic on SAS disk failureDuplicate2015-01-14

Actions
Actions #1

Updated by Jorge Schrauwen almost 9 years ago

I've also just (monday early morning) been hit by this.

hardware
----------------
LSI SAS 9201-16i
BIOS version : 7.35.00.00
Firmware version : 18.00.00.00
SuperMicro SC846TQ-R1200B (rev. M) chassis (SF-forgotthenumber to 4 sata fan outs)
(I know, not the best of things to do but it's only a home server)

uname a
---------------

SunOS monolith.acheron.be 5.11 omnios-10b9c79 i86pc i386 i86pc

(that is OmniOS current stable)

fmdump
----------------
TIME UUID SUNW-MSG-ID
Jan 12 2015 02:32:26.370032000 a3f3d99e-aafb-6da0-8d01-cb5f86a067d1 SUNOS-8000-KL

TIME CLASS ENA
Jan 12 02:32:26.3407 ireport.os.sunos.panic.dump_available 0x0000000000000000
Jan 12 02:31:45.0462 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000

nvlist version: 0
version = 0x0
class = list.suspect
uuid = a3f3d99e-aafb-6da0-8d01-cb5f86a067d1
code = SUNOS-8000-KL
diag-time = 1421026346 342282
de = fmd:///module/software-diagnosis
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list0)
nvlist version: 0
version = 0x0
class = defect.sunos.kernel.panic
certainty = 0x64
asru = sw:///:path=/var/crash/unknown/.a3f3d99e-aafb-6da0-8d01-cb5f86a067d1
resource = sw:///:path=/var/crash/unknown/.a3f3d99e-aafb-6da0-8d01-cb5f86a067d1
savecore-succcess = 1
dump-dir = /var/crash/unknown
dump-files = vmdump.3
os-instance-uuid = a3f3d99e-aafb-6da0-8d01-cb5f86a067d1
panicstr = BAD TRAP: type=d (#gp General protection) rp=ffffff00f5a3d930 addr=ffffff21f2bcb036
panicstack = unix:real_mode_stop_cpu_stage2_end+9e23 () | unix:trap+a30 () | unix:cmntrap+e6 () | mpt_sas:mptsas_watchsubr+111 () | mpt_sas:mptsas_watch+96 () | genunix:callout_list_expire+98 () | genunix:callout_expire+3b () | genunix:callout_execute+20 () | genunix:taskq_thread+2d0 () | unix:thread_start+8 () |
crashtime = 1421025252
panic-time = Mon Jan 12 02:14:12 2015 CET
(end fault-list0)

fault-status = 0x1
severity = Major
_ttl = 0x1
_tod = 0x54b3242a 0x160e3d80
--------

One disk is also missing, nowhere near the box so not sure if the disk is dead or not.

Actions #2

Updated by Simon K over 8 years ago

I think I may have hit the same bug (multiple times in the past). The following pastebin shows the kernel panic caused by one dying disk: http://pastebin.com/dVnNVJcy
I can provide further information if needed. The mpt_sas bugs are really annoying...

Actions #3

Updated by Hans Rosenfeld over 8 years ago

Simon Klinkert wrote:

I think I may have hit the same bug (multiple times in the past). The following pastebin shows the kernel panic caused by one dying disk: http://pastebin.com/dVnNVJcy
I can provide further information if needed. The mpt_sas bugs are really annoying...

Are you running the latest illumos-gate bits? What does uname -v say?

Actions #4

Updated by Simon K over 8 years ago

Hans Rosenfeld wrote:

Simon Klinkert wrote:

I think I may have hit the same bug (multiple times in the past). The following pastebin shows the kernel panic caused by one dying disk: http://pastebin.com/dVnNVJcy
I can provide further information if needed. The mpt_sas bugs are really annoying...

Are you running the latest illumos-gate bits? What does uname -v say?

Nope, the Illumos on this machine is a bit older. It's from Jun 2014.

Actions #5

Updated by Marcel Telka almost 7 years ago

  • Related to Bug #5538: Kernel panic on SAS disk failure added
Actions #6

Updated by Marcel Telka almost 7 years ago

  • Category set to driver - device drivers
Actions #7

Updated by Marcel Telka almost 2 years ago

  • Has duplicate Bug #5538: Kernel panic on SAS disk failure added
Actions #8

Updated by Marcel Telka almost 2 years ago

  • Related to deleted (Bug #5538: Kernel panic on SAS disk failure)
Actions #9

Updated by Marcel Telka almost 2 years ago

  • Status changed from New to Feedback
  • Assignee set to Marcel Telka

I have 10+ crashdumps with this issue from 2014/2015 (illumos commit 511ef1d6e7 from June 2014), but zero similar crashdumps once we resynced with illumos commit 03b1c2971d (April 2015). So, it looks like this issue was fixed long time ago. Possible candidates that touched mpt_sas sources between June 2014 and April 2015 are: #4546, #4960, #5015, #5016, #5017, #5080, #5081, #5297.

Is there any recent panic around with this stack?

Actions #10

Updated by Predrag Zečević 9 months ago

Hi all,
does this looks similar?

:; fmdump -Vp -u 0292fd6d-abd7-e2d3-adef-95d2427d6a58
TIME                           UUID                                 SUNW-MSG-ID
Mar 02 2023 22:30:51.034649000 0292fd6d-abd7-e2d3-adef-95d2427d6a58 SUNOS-8000-KL

  TIME                 CLASS                                 ENA
  Mar 02 22:30:51.0000 ireport.os.sunos.panic.dump_available 0x0000000000000000
  Mar 02 22:30:45.0397 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = 0292fd6d-abd7-e2d3-adef-95d2427d6a58
        code = SUNOS-8000-KL
        diag-time = 1677792651 11754
        de = fmd:///module/software-diagnosis
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = defect.sunos.kernel.panic
                certainty = 0x64
                asru = sw:///:path=/var/cores/.0292fd6d-abd7-e2d3-adef-95d2427d6a58
                resource = sw:///:path=/var/cores/.0292fd6d-abd7-e2d3-adef-95d2427d6a58
                savecore-succcess = 1
                dump-dir = /var/cores
                dump-files = vmdump.1
                os-instance-uuid = 0292fd6d-abd7-e2d3-adef-95d2427d6a58
                panicstr = BAD TRAP: type=d (#gp General protection) rp=fffffe00212d8720 addr=fffffe1705474380
                panicstack = unix:real_mode_stop_cpu_stage2_end+bb3d () | unix:trap+bf2 () | unix:cmntrap+e9 () | genunix:list_insert_tail+e () | zfs:dbuf_dirty+4a4 () | zfs:dmu_buf_will_dirty_impl+cf () | zfs:dmu_buf_will_dirty+13 () | zfs:sa_attr_op+2e7 () | zfs:sa_bulk_update_impl+6d () | zfs:sa_bulk_update+4d () | zfs:zfs_link_destroy+20c () | zfs:zfs_remove+582 () | genunix:fop_remove+5d () | genunix:vn_removeat+372 () | genunix:unlinkat+56 () | unix:brand_sys_syscall+1fe () | 
                crashtime = 1677792593

                panic-time = March  2, 2023 at 10:29:53 PM CET CET
        (end fault-list[0])

        fault-status = 0x1
        severity = Major
        __ttl = 0x1
        __tod = 0x6401158b 0x210b3a8

PC:

        Manufacturer: ASUSTeK COMPUTER INC.
        Product Name: P8H77-V LE
        Version: Rev X.0x

:; prtdiag 
System Configuration: System manufacturer System Product Name
BIOS Configuration: American Megatrends Inc. 1301 08/20/2013

==== Processor Sockets ====================================

Version                          Location Tag
-------------------------------- --------------------------
Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz LGA1155

==== Memory Device Sockets ================================

Type        Status Set Device Locator      Bank Locator
----------- ------ --- ------------------- ----------------
unknown     empty  0   ChannelA-DIMM0      BANK 0
unknown     empty  0   ChannelA-DIMM1      BANK 1
DDR3        in use 0   ChannelB-DIMM0      BANK 2
DDR3        in use 0   ChannelB-DIMM1      BANK 3

==== On-Board Devices =====================================
Onboard Ethernet

==== Upgradeable Slots ====================================

ID  Status    Type             Description
--- --------- ---------------- ----------------------------
1   available PCI Express      PCIEX16_1
2   in use    PCI Express      PCIEX16_2
3   available PCI Express      PCIEX1_1
4   available PCI Express      PCIEX1_2
5   available PCI              PCI1
6   available PCI              PCI2
7   available PCI              PCI3

Version
:; uname -rosv
SunOS 5.11 illumos-43a6dad6f6 illumos

almost latest...

Do you need more info?

Actions #11

Updated by Marcel Telka 7 months ago

  • Status changed from Feedback to Closed

Predrag Zečević wrote in #note-10:

does this looks similar?

Sorry, no. The stack looks completely different. There is no mpt_sas there at all.

Actions

Also available in: Atom PDF