Bug #5537
openHot device removal on mpt_sas not handled gracefully
0%
Description
Hot-connecting drive to system and hot removal (without mounting or otherwise using the device), results in a barely usable system:
Jan 13 20:28:06 box scsi: [ID 583861 kern.info] mpt_sas15 at mpt_sas10: scsi-iport 10 Jan 13 20:28:06 box genunix: [ID 936769 kern.info] mpt_sas15 is /pci@0,0/pci15ad,7a0@17/pci1000,3020@0/iport@10 Jan 13 20:28:06 box genunix: [ID 408114 kern.info] /pci@0,0/pci15ad,7a0@17/pci1000,3020@0/iport@10 (mpt_sas15) online Jan 13 20:28:06 box pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) instance 0 irq 0xe vector 0x48 ioapic 0x2 intin 0xe is bound to cpu 0 Jan 13 20:28:06 box scsi: [ID 583861 kern.info] sd17 at scsi_vhci0: unit-address g5000c5004e0da063: f_sym Jan 13 20:28:06 box genunix: [ID 936769 kern.info] sd17 is /scsi_vhci/disk@g5000c5004e0da063 Jan 13 20:28:06 box genunix: [ID 408114 kern.info] /scsi_vhci/disk@g5000c5004e0da063 (sd17) online Jan 13 20:28:06 box genunix: [ID 483743 kern.info] /scsi_vhci/disk@g5000c5004e0da063 (sd17) multipath status: degraded: path 14 mpt_sas15/disk@w5000c5004e0da063,0 is online Jan 14 09:13:20 box scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10): Jan 14 09:13:20 box mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110d00 Jan 14 09:13:20 box scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10): Jan 14 09:13:20 box mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110d00 Jan 14 09:13:23 box scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10): Jan 14 09:13:23 box mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31170000 Jan 14 09:13:23 box scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10): Jan 14 09:13:23 box mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31170000 Jan 14 09:13:25 box genunix: [ID 408114 kern.info] /scsi_vhci/disk@g5000c5004e0da063 (sd17) offline Jan 14 09:13:25 box genunix: [ID 483743 kern.info] /scsi_vhci/disk@g5000c5004e0da063 (sd17) multipath status: failed: path 14 mpt_sas15/disk@w5000c5004e0da063,0 is offline
At this point, commands like cfgadm -al, format etc hang indefinitely
As this was with the system in the same state as reported in https://www.illumos.org/issues/5534 I needed to prove it wasn't as a result of that previous state, so ...
I grabbed a live dump with savecore -Lv and hit the reset button (an init 6/reboot just didn't do anything!)
Repeated the process
Jan 14 09:31:41 box scsi: [ID 583861 kern.info] mpt_sas15 at mpt_sas10: scsi-iport 10 Jan 14 09:31:41 box genunix: [ID 936769 kern.info] mpt_sas15 is /pci@0,0/pci15ad,7a0@17/pci1000,3020@0/iport@10 Jan 14 09:31:41 box genunix: [ID 408114 kern.info] /pci@0,0/pci15ad,7a0@17/pci1000,3020@0/iport@10 (mpt_sas15) online Jan 14 09:31:41 box pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) instance 0 irq 0xe vector 0x48 ioapic 0x2 intin 0xe is bound to cpu 0 Jan 14 09:31:41 box scsi: [ID 583861 kern.info] sd17 at scsi_vhci0: unit-address g5000c5004e0da063: f_sym Jan 14 09:31:41 box genunix: [ID 936769 kern.info] sd17 is /scsi_vhci/disk@g5000c5004e0da063 Jan 14 09:31:41 box genunix: [ID 408114 kern.info] /scsi_vhci/disk@g5000c5004e0da063 (sd17) online Jan 14 09:31:41 box genunix: [ID 483743 kern.info] /scsi_vhci/disk@g5000c5004e0da063 (sd17) multipath status: degraded: path 14 mpt_sas15/disk@w5000c5004e0da063,0 is online Jan 14 09:35:58 box scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10): Jan 14 09:35:58 box mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110d00 Jan 14 09:35:58 box scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10): Jan 14 09:35:58 box mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110d00 Jan 14 09:35:59 box scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10): Jan 14 09:35:59 box mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31170000 Jan 14 09:35:59 box scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10): Jan 14 09:35:59 box mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31170000 Jan 14 09:36:02 box scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10): Jan 14 09:36:02 box mptsas_check_task_mgt: Task 0x3 failed. IOCStatus=0x4b IOCLogInfo=0x0 target=12 Jan 14 09:36:02 box scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10): Jan 14 09:36:02 box mptsas_ioc_task_management failed try to reset ioc to recovery! Jan 14 09:36:03 box scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10): Jan 14 09:36:03 box mpt10 Firmware version v19.0.0.0 (?) Jan 14 09:36:03 box scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@17/pci1000,3020@0 (mpt_sas10): Jan 14 09:36:03 box mpt10: IOC Operational. Jan 14 09:36:12 box genunix: [ID 408114 kern.info] /scsi_vhci/disk@g5000c5004e0da063 (sd17) offline Jan 14 09:36:12 box genunix: [ID 483743 kern.info] /scsi_vhci/disk@g5000c5004e0da063 (sd17) multipath status: failed: path 14 mpt_sas15/disk@w5000c5004e0da063,0 is offline Jan 14 09:36:12 box mpt_sas: [ID 660565 kern.warning] WARNING: mptsas_free_devhdl: passthru SAS IO Unit Control IOCStatus 8 Jan 14 09:36:41 box pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) instance 0 irq 0xe vector 0x48 ioapic 0x2 intin 0xe is bound to cpu 0
I grabbed a savecore -Lv at this point , link below, I haven't any further idea where to go next, but seems important that this issue is raised as drive failures could present themselves to the system with a similar outcome.
No data to display