Project

General

Profile

Actions

Bug #8745

open

Hipster is marking drives as bad when they are ok

Added by Tom Shafron over 4 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
driver - device drivers
Start date:
2017-10-29
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

When the logs show an error requesting the inquiry page 0x83 on a target the OS stops looking for any more drives and marks them all as failed (as per "fmadm faulty").

For example, the logs show this error repeating:

scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci8086,340e@7/pci1000,3040@0 (mpt_sas0):
mptsas request inquiry page 0x83 for target:1a, lun:0 failed!

But every drive in the enclosure after this drive was flagged as bad as well and marked faulted or unavail in the ZFS pools.

After disabling the physical port 0x1A (26) the other 4 drives after that drive in the enclosure immediately came back online and the pool resilvered.

These are SATA drives in a supermicro enclosure (which uses an expander).

It appears that the OS gets stuck trying to access the failed drive in a loop instead of continuing on to the working drives that are on higher port #s in the enclosure.

I was able to confirm this issue did not occur in 151_a8 or a9 by booting into the snapshots for those versions. In 151_a8 and 151_a9, the drives after the failed drive in the enclosure were all online and working and the unit started to resilver. But upon booting backing to hipster they dropped offline again.


Related issues

Related to illumos gate - Bug #12163: mpt_sas: Collateral damage caused by dead SATA diskClosedMarcel Telka

Actions
Actions #1

Updated by Marcel Telka over 4 years ago

  • Project changed from OpenIndiana Distribution to illumos gate
Actions #2

Updated by Marcel Telka over 2 years ago

  • Category set to driver - device drivers
  • Assignee set to Marcel Telka

I think this might be the exactly same problem as #12163. Please try to update to illumos/hipster with that bug fixed to see whether your problem is still reproducible.

Actions #3

Updated by Marcel Telka over 2 years ago

  • Related to Bug #12163: mpt_sas: Collateral damage caused by dead SATA disk added
Actions

Also available in: Atom PDF