mpt_sas panic when a SAS cable is disconnected
On a system with multiple LSI-9206-8e adapters all on firmware version 19, if any single cable is pulled the mpt_sas will panic. The path redundancy should simply be reduced without any failures. All disks have multipath through two HBAs and two SAS expanders.
A crash dump is available at: [[ftp://ftp.nrg.wustl.edu/pub/zfs/mpt_sas_cable_disconnect-2015-04-29.dump]]
This is on OmniOS-r151014, DataON DNS-1660 JBODs, Seagate Constellation SAS - 4TB disks.
Updated by Chip Schweiss about 7 years ago
- File Pull cable #1.txt Pull cable #1.txt added
- File Re-insert cable #1.txt Re-insert cable #1.txt added
- File Pull cable #2.txt Pull cable #2.txt added
- File Re-insert cable #2.txt Re-insert cable #2.txt added
- File After reboot one cable disconnnected..txt After reboot one cable disconnnected..txt added
- File After reboot. Insert cable, remove cable. Disks go offline..txt After reboot. Insert cable, remove cable. Disks go offline..txt added
I did some additional testing. It appears the panic is triggered because the pool instantly has too many offline disks and fail mode is set to panic.
If none of the pools are imported more is revealed about this problem. Instead of multipath dropping to one path the disks are taken offline. See the system log messages in the attached files.
I did the following, capturing system log output for each step:
1. Export all the pools connected on the SAS network (HA move to the sister system).
2. Disconnect one SAS cable
3. Reconnect cable
4. Disconnect second SAS cable
5. Reconnect cable
6. Disconnect 1st cable, reboot
7. Reconnect cable
8. Disconnect cable
9. Trigger panic (reboot -d -n -q) Dump available at:[[ ftp://ftp.nrg.wustl.edu/pub/zfs/cable_disconnect_disks_offline.dump
This bug renders multipath SAS useless, probably worse than single path, because any SAS network failure is a single point of failure.