Project

General

Profile

Bug #12207

mpt_sas throws errors continously

Added by Jason Matthews 4 months ago. Updated 27 days ago.

Status:
New
Priority:
High
Assignee:
-
Category:
driver - device drivers
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Server configuration:

AMD EPYC 7401P 24-Core Processor CPU based system
2 LSI SAS3008 PCI-Express Fusion-MPT SAS-3 (mpt_sas) / LSI9300-8e
1 Newisys/Viking 102 bay NDS-41020 JBOD fully loaded with Seagate ST8000NM0075 drives
physical cabled with one cable per LSI card connected to one each of two controllers on the jbod
multipath is working

The driver appears to be complaining about SAS end points within the JBOD / LSI expanders - not hard drives

root@backup003:/usr/include/sys/scsi/impl# dmesg |grep w50093 |awk '{ print $13 " " $14 }' | sort | uniq
target 118
target 60
target 77

I am not sure what these errors mean, but they do not instill confidence in me.

2020-01-17T07:22:35.746966+00:00 backup003 scsi: [ID 243001 kern.warning] WARNING: /pci@51,0/pci1022,1453@1,3/pci1000,30a0@0 (mpt_sas0):#012#011mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110e05
2020-01-17T07:22:36.247201+00:00 backup003 scsi: [ID 365881 kern.info] /pci@51,0/pci1022,1453@1,3/pci1000,30a0@0 (mpt_sas0):#012#011Log info 0x31110e05 received for target 118 w50093e23000000fd.#012#011scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
2020-01-17T07:22:36.249874+00:00 backup003 scsi: [ID 243001 kern.warning] WARNING: /pci@51,0/pci1022,1453@1,3/pci1000,30a0@0 (mpt_sas0):#012#011mptsas_handle_event_sync: event 0xf, IOCStatus=0x8000, IOCLogInfo=0x31120114
2020-01-17T07:22:36.249886+00:00 backup003 scsi: [ID 243001 kern.warning] WARNING: /pci@51,0/pci1022,1453@1,3/pci1000,30a0@0 (mpt_sas0):#012#011mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31120114
2020-01-17T07:22:36.251388+00:00 backup003 scsi: [ID 365881 kern.info] /pci@51,0/pci1022,1453@1,3/pci1000,30a0@0 (mpt_sas0):#012#011Log info 0x31120114 received for target 118 w50093e23000000fd.#012#011scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
2020-01-17T07:22:36.498743+00:00 backup003 scsi: [ID 243001 kern.warning] WARNING: /pci@51,0/pci1022,1453@1,3/pci1000,30a0@0 (mpt_sas0):#012#011mptsas_handle_event_sync: event 0xf, IOCStatus=0x8000, IOCLogInfo=0x31110e05
2020-01-17T07:22:36.498758+00:00 backup003 scsi: [ID 243001 kern.warning] WARNING: /pci@51,0/pci1022,1453@1,3/pci1000,30a0@0 (mpt_sas0):#012#011mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110e05
2020-01-17T07:22:36.997203+00:00 backup003 scsi: [ID 365881 kern.info] /pci@51,0/pci1022,1453@1,3/pci1000,30a0@0 (mpt_sas0):#012#011Log info 0x31110e05 received for target 118 w50093e23000000fd.#012#011scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
2020-01-17T07:22:37.551826+00:00 backup003 scsi: [ID 243001 kern.warning] WARNING: /pci@19,0/pci1022,1453@3,4/pci1000,30a0@0 (mpt_sas1):#012#011mptsas_handle_event_sync: event 0xf, IOCStatus=0x8000, IOCLogInfo=0x31120114
2020-01-17T07:22:37.551843+00:00 backup003 scsi: [ID 243001 kern.warning] WARNING: /pci@19,0/pci1022,1453@3,4/pci1000,30a0@0 (mpt_sas1):#012#011mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31120114
2020-01-17T07:22:37.553310+00:00 backup003 scsi: [ID 365881 kern.info] /pci@19,0/pci1022,1453@3,4/pci1000,30a0@0 (mpt_sas1):#012#011Log info 0x31120114 received for target 60 w50093e230000027d.#012#011scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
2020-01-17T07:22:37.735274+00:00 backup003 scsi: [ID 243001 kern.warning] WARNING: /pci@19,0/pci1022,1453@3,4/pci1000,30a0@0 (mpt_sas1):#012#011mptsas_handle_event_sync: event 0xf, IOCStatus=0x8000, IOCLogInfo=0x31110e05
2020-01-17T07:22:37.735313+00:00 backup003 scsi: [ID 243001 kern.warning] WARNING: /pci@19,0/pci1022,1453@3,4/pci1000,30a0@0 (mpt_sas1):#012#011mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110e05
2020-01-17T07:22:38.235496+00:00 backup003 scsi: [ID 365881 kern.info] /pci@19,0/pci1022,1453@3,4/pci1000,30a0@0 (mpt_sas1):#012#011Log info 0x31110e05 received for target 60 w50093e230000027d.#012#011scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
2020-01-17T07:22:38.238199+00:00 backup003 scsi: [ID 243001 kern.warning] WARNING: /pci@19,0/pci1022,1453@3,4/pci1000,30a0@0 (mpt_sas1):#012#011mptsas_handle_event_sync: event 0xf, IOCStatus=0x8000, IOCLogInfo=0x31120114
2020-01-17T07:22:38.238253+00:00 backup003 scsi: [ID 243001 kern.warning] WARNING: /pci@19,0/pci1022,1453@3,4/pci1000,30a0@0 (mpt_sas1):#012#011mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31120114
2020-01-17T07:22:38.239709+00:00 backup003 scsi: [ID 365881 kern.info] /pci@19,0/pci1022,1453@3,4/pci1000,30a0@0 (mpt_sas1):#012#011Log info 0x31120114 received for target 60 w50093e230000027d.#012#011scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
2020-01-17T07:22:38.487053+00:00 backup003 scsi: [ID 243001 kern.warning] WARNING: /pci@19,0/pci1022,1453@3,4/pci1000,30a0@0 (mpt_sas1):#012#011mptsas_handle_event_sync: event 0xf, IOCStatus=0x8000, IOCLogInfo=0x31110e05
2020-01-17T07:22:38.487102+00:00 backup003 scsi: [ID 243001 kern.warning] WARNING: /pci@19,0/pci1022,1453@3,4/pci1000,30a0@0 (mpt_sas1):#012#011mptsas_handle_event: IOCStatus=0x8

History

#1

Updated by Marcel Telka 4 months ago

I would say this might be problem with disks. Are those 3 disks (targets 60, 77, 118) working properly? Did you tried to replace them?

#2

Updated by Jason Matthews 4 months ago

Marcel Telka wrote:

I would say this might be problem with disks. Are those 3 disks (targets 60, 77, 118) working properly? Did you tried to replace them?

I tried to address this in the original post.

Those appear to me to be SAS end points on the expanders not disks at all. The disks WWN are all begin with 5000c500aec* the JBOD itself is 500093d00249*

The bad addresses are
root@backup002:/root# dmesg | grep "93d00" | awk '{ print $14 " // " $15 }' | sort |uniq
118 // w500093d00249f0fd.#012#011scsi_status=0x0,
118 // w500093d00249f2fd.#012#011scsi_status=0x0,
60 // w500093d00249f07d.#012#011scsi_status=0x0,
60 // w500093d00249f27d.#012#011scsi_status=0x0,
77 // w500093d00249f0bd.#012#011scsi_status=0x0,
77 // w500093d00249f2bd.#012#011scsi_status=0x0,

From ::mptsas -t, you can see the expanders themselves are 500093d00249* not the drives.

fffffe85837e5380  106   19 500093d00249f23d   0     MAX   INACTIVE 0
End device: SEP, SSP tgt, SMP init
fffffe8583821a00 107 60 500093d00249f27d 0 MAX INACTIVE 0
End device: SEP, SSP tgt, SMP init
fffffe8583821180 108 77 500093d00249f2bd 0 MAX INACTIVE 0
End device: SEP, SSP tgt, SMP init
fffffe8583815c00 109 118 500093d00249f2fd 0 MAX INACTIVE 0
End device: SEP, SSP tgt, SMP init
fffffe858243b608 9 500093d00249f23f 00f0
Edge expander: LSI device, direct attach, SMP tgt
fffffe8582439d70 16 500093d00249f27f 00f0
Edge expander: LSI device, SMP tgt
fffffe8582439d20 17 500093d00249f2bf 00f0
Edge expander: LSI device, SMP tgt
fffffe858221cf00 18 500093d00249f2ff 00f0
Edge expander: LSI device, SMP tgt
#3

Updated by Jason Matthews 4 months ago

Is the problem that the expander is being mis-identified as a target?
There should only be targets, initiators, and expanders, so an expander that is also a target confuses me... but I am out of my element at this level.

#4

Updated by Marcel Telka 27 days ago

What is the firmware version of your LSI9300-8e board? In a case it is 16.00.xx, could please try to downgrade to 15.00.xx and test it again? Thanks.

Also available in: Atom PDF