Project

General

Profile

Bug #7813

mpt_sas does not like concurrent HBA resets

Added by Marcel Telka almost 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
driver - device drivers
Start date:
2017-01-27
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

Concurrent HBA resets might cause the following panic with the debug kernel:

> ::status
debugging crash dump vmcore.23 (64-bit) from qualactin
operating system: 5.11 titanic_34 (i86pc)
image uuid: 863767ec-62bc-c128-b91b-e40d35614b3b
panic message: assertion failed: mpt->m_ncmds == 0, file: ../../../../src/uts/common/io/scsi/adapters/mpt_sas/mptsas.c, line: 9743
dump content: kernel pages only
> ::stack
vpanic()
0xfffffffffbe0bf88()
mptsas_alloc_active_slots+0xc0(ffffff04882ae000, 0)
mptsas_init_chip+0xa2(ffffff04882ae000, 0)
mptsas_restart_ioc+0x97(ffffff04882ae000)
mptsas_ioctl+0x4b4(a500000000, 4903, 0, 100003, ffffff04842ba288, ffffff001157ce58)
cdev_ioctl+0x39(a500000000, 4903, 0, 100003, ffffff04842ba288, ffffff001157ce58)
spec_ioctl+0x60(ffffff06bd8f3f00, 4903, 0, 100003, ffffff04842ba288, ffffff001157ce58)
fop_ioctl+0x55(ffffff06bd8f3f00, 4903, 0, 100003, ffffff04842ba288, ffffff001157ce58)
ioctl+0x9b(3, 4903, 0)
_sys_sysenter_post_swapgs+0x237()
>

Steps to reproduce

To reliably reproduce the panic we need to instrument the mpt_sas driver a bit to make sure the HBA resets comes concurrently. The instrumentation is attached as mpt_sas_instr.patch. The mptreset tool can be found in #5698. Run the following:

# mdb -kwe 'panic_repro/W 0'
panic_repro:    0xffffffff      =       0x0
# ./mptreset /devices/pci@0,0/pci8086,d138@3/pci1000,3010@0:devctl &
[1] 591548
# ./mptreset /devices/pci@0,0/pci8086,d138@3/pci1000,3010@0:devctl &
[2] 591573
# sleep 2
# mdb -kwe 'panic_repro/W 1'
panic_repro:    0               =       0x1
#

After few seconds the system should panic.


Files

mpt_sas_instr.patch (1.09 KB) mpt_sas_instr.patch Marcel Telka, 2017-01-27 06:21 PM

Related issues

Related to illumos gate - Bug #5698: panic in mpt_sas: vmem_hash_delete(ffffff1aa3456000, 1, 8): bad freeClosedMarcel Telka2015-03-09

Actions
#1

Updated by Marcel Telka almost 4 years ago

  • Related to Bug #5698: panic in mpt_sas: vmem_hash_delete(ffffff1aa3456000, 1, 8): bad free added

Also available in: Atom PDF