Project

General

Profile

Bug #5537

Hot device removal on mpt_sas not handled gracefully

Added by Keith Hall over 5 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
kernel
Start date:
2015-01-14
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

Hot-connecting drive to system and hot removal (without mounting or otherwise using the device), results in a barely usable system:

Jan 13 20:28:06 box scsi: [ID 583861 kern.info] mpt_sas15 at mpt_sas10: scsi-iport 10
Jan 13 20:28:06 box genunix: [ID 936769 kern.info] mpt_sas15 is /pci@0,0/pci15ad,7a0@17/pci1000,3020@0/iport@10
Jan 13 20:28:06 box genunix: [ID 408114 kern.info] /pci@0,0/pci15ad,7a0@17/pci1000,3020@0/iport@10 (mpt_sas15) online
Jan 13 20:28:06 box pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) instance 0 irq 0xe vector 0x48 ioapic 0x2 intin 0xe is bound to cpu 0
Jan 13 20:28:06 box scsi: [ID 583861 kern.info] sd17 at scsi_vhci0: unit-address g5000c5004e0da063: f_sym
Jan 13 20:28:06 box genunix: [ID 936769 kern.info] sd17 is /scsi_vhci/disk@g5000c5004e0da063
Jan 13 20:28:06 box genunix: [ID 408114 kern.info] /scsi_vhci/disk@g5000c5004e0da063 (sd17) online
Jan 13 20:28:06 box genunix: [ID 483743 kern.info] /scsi_vhci/disk@g5000c5004e0da063 (sd17) multipath status: degraded: path 14 mpt_sas15/disk@w5000c5004e0da063,0 is online

Jan 14 09:13:20 box scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10):
Jan 14 09:13:20 box     mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110d00
Jan 14 09:13:20 box scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10):
Jan 14 09:13:20 box     mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110d00
Jan 14 09:13:23 box scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10):
Jan 14 09:13:23 box     mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31170000
Jan 14 09:13:23 box scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10):
Jan 14 09:13:23 box     mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31170000
Jan 14 09:13:25 box genunix: [ID 408114 kern.info] /scsi_vhci/disk@g5000c5004e0da063 (sd17) offline
Jan 14 09:13:25 box genunix: [ID 483743 kern.info] /scsi_vhci/disk@g5000c5004e0da063 (sd17) multipath status: failed: path 14 mpt_sas15/disk@w5000c5004e0da063,0 is offline

At this point, commands like cfgadm -al, format etc hang indefinitely

As this was with the system in the same state as reported in https://www.illumos.org/issues/5534 I needed to prove it wasn't as a result of that previous state, so ...

I grabbed a live dump with savecore -Lv and hit the reset button (an init 6/reboot just didn't do anything!)

Repeated the process

Jan 14 09:31:41 box scsi: [ID 583861 kern.info] mpt_sas15 at mpt_sas10: scsi-iport 10
Jan 14 09:31:41 box genunix: [ID 936769 kern.info] mpt_sas15 is /pci@0,0/pci15ad,7a0@17/pci1000,3020@0/iport@10
Jan 14 09:31:41 box genunix: [ID 408114 kern.info] /pci@0,0/pci15ad,7a0@17/pci1000,3020@0/iport@10 (mpt_sas15) online
Jan 14 09:31:41 box pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) instance 0 irq 0xe vector 0x48 ioapic 0x2 intin 0xe is bound to cpu 0
Jan 14 09:31:41 box scsi: [ID 583861 kern.info] sd17 at scsi_vhci0: unit-address g5000c5004e0da063: f_sym
Jan 14 09:31:41 box genunix: [ID 936769 kern.info] sd17 is /scsi_vhci/disk@g5000c5004e0da063
Jan 14 09:31:41 box genunix: [ID 408114 kern.info] /scsi_vhci/disk@g5000c5004e0da063 (sd17) online
Jan 14 09:31:41 box genunix: [ID 483743 kern.info] /scsi_vhci/disk@g5000c5004e0da063 (sd17) multipath status: degraded: path 14 mpt_sas15/disk@w5000c5004e0da063,0 is online

Jan 14 09:35:58 box scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10):
Jan 14 09:35:58 box     mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110d00
Jan 14 09:35:58 box scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10):
Jan 14 09:35:58 box     mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110d00
Jan 14 09:35:59 box scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10):
Jan 14 09:35:59 box     mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31170000
Jan 14 09:35:59 box scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10):
Jan 14 09:35:59 box     mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31170000
Jan 14 09:36:02 box scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10):
Jan 14 09:36:02 box     mptsas_check_task_mgt: Task 0x3 failed. IOCStatus=0x4b IOCLogInfo=0x0 target=12
Jan 14 09:36:02 box scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10):
Jan 14 09:36:02 box     mptsas_ioc_task_management failed try to reset ioc to recovery!
Jan 14 09:36:03 box scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10):
Jan 14 09:36:03 box     mpt10 Firmware version v19.0.0.0 (?)
Jan 14 09:36:03 box scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10):
Jan 14 09:36:03 box     mpt10: IOC Operational.
Jan 14 09:36:12 box genunix: [ID 408114 kern.info] /scsi_vhci/disk@g5000c5004e0da063 (sd17) offline
Jan 14 09:36:12 box genunix: [ID 483743 kern.info] /scsi_vhci/disk@g5000c5004e0da063 (sd17) multipath status: failed: path 14 mpt_sas15/disk@w5000c5004e0da063,0 is offline
Jan 14 09:36:12 box mpt_sas: [ID 660565 kern.warning] WARNING: mptsas_free_devhdl: passthru SAS IO Unit Control IOCStatus 8

Jan 14 09:36:41 box pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) instance 0 irq 0xe vector 0x48 ioapic 0x2 intin 0xe is bound to cpu 0

I grabbed a savecore -Lv at this point , link below, I haven't any further idea where to go next, but seems important that this issue is raised as drive failures could present themselves to the system with a similar outcome.

https://www.dropbox.com/s/wrqt0e65ntwiq90/vmdump.3.gz?dl=0

Also available in: Atom PDF