Project

General

Profile

Actions

Bug #5920

open

mpt_sas panic when a SAS cable is disconnected

Added by Chip Schweiss over 6 years ago. Updated over 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
driver - device drivers
Start date:
2015-05-12
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

On a system with multiple LSI-9206-8e adapters all on firmware version 19, if any single cable is pulled the mpt_sas will panic. The path redundancy should simply be reduced without any failures. All disks have multipath through two HBAs and two SAS expanders.

A crash dump is available at: [[ftp://ftp.nrg.wustl.edu/pub/zfs/mpt_sas_cable_disconnect-2015-04-29.dump]]

This is on OmniOS-r151014, DataON DNS-1660 JBODs, Seagate Constellation SAS - 4TB disks.


Files

Pull cable #1.txt (56.6 KB) Pull cable #1.txt Step #2 Chip Schweiss, 2015-05-13 04:33 PM
Re-insert cable #1.txt (19.9 KB) Re-insert cable #1.txt Step #3 Chip Schweiss, 2015-05-13 04:33 PM
Pull cable #2.txt (57.6 KB) Pull cable #2.txt Step #4 Chip Schweiss, 2015-05-13 04:33 PM
Re-insert cable #2.txt (20 KB) Re-insert cable #2.txt Step #5 Chip Schweiss, 2015-05-13 04:33 PM
After reboot one cable disconnnected..txt (36.6 KB) After reboot one cable disconnnected..txt Step #6 Chip Schweiss, 2015-05-13 04:34 PM
After reboot. Insert cable, remove cable. Disks go offline..txt (58.4 KB) After reboot. Insert cable, remove cable. Disks go offline..txt Step #9 Chip Schweiss, 2015-05-13 04:34 PM
Actions #1

Updated by Chip Schweiss over 6 years ago

I did some additional testing. It appears the panic is triggered because the pool instantly has too many offline disks and fail mode is set to panic.

If none of the pools are imported more is revealed about this problem. Instead of multipath dropping to one path the disks are taken offline. See the system log messages in the attached files.

I did the following, capturing system log output for each step:

1. Export all the pools connected on the SAS network (HA move to the sister system).
2. Disconnect one SAS cable
3. Reconnect cable
4. Disconnect second SAS cable
5. Reconnect cable
6. Disconnect 1st cable, reboot
7. Reconnect cable
8. Disconnect cable
9. Trigger panic (reboot -d -n -q) Dump available at:[[ ftp://ftp.nrg.wustl.edu/pub/zfs/cable_disconnect_disks_offline.dump
]]

This bug renders multipath SAS useless, probably worse than single path, because any SAS network failure is a single point of failure.

Actions

Also available in: Atom PDF