Project

General

Profile

Bug #1343

mpt_sas sometimes stalls forever

Added by Rich Ercolani about 8 years ago. Updated almost 8 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
driver - device drivers
Start date:
2011-08-05
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Opening a new bug for this to keep it separate from bug #1069, which refers to mpt.

Sometimes under load, the mpt_sas driver will stall all IO on any mpt_sas card for an extended duration - often it eventually "wakes up" and begins working again, but sometimes it never wakes up, and the system requires a hard power cycle.

Relevant logs:
http://skysrv.pha.jhu.edu/~rercola/kmdb_dump_mptsas.log.bz2
http://skysrv.pha.jhu.edu/~rercola/mptsas_dmesg.log
(Note the hour since any messages between the last error from mpt_sas and me screwing up my mdb call in dmesg)

I can log in and poke around, but no matter how long I wait, no IO happens on that controller until the machine is power-cycled.

$ uname -a
SunOS zettabyte 5.11 oi_148 i86pc i386 i86pc


Related issues

Related to illumos gate - Bug #1069: IO hang with bad disk on mpt and mpt_sasNew2011-05-26

Actions

History

#1

Updated by Albert Lee about 8 years ago

ffffff000c827c40 fffffffffbc2e330                0   0  60 fffffffffbd182e0
  PC: _resume_from_idle+0xf1    THREAD: mt_config_thread()
  stack pointer for thread ffffff000c827c40: ffffff000c8276b0
  [ ffffff000c8276b0 _resume_from_idle+0xf1() ]
    swtch+0x145()
    cv_wait+0x61()
    scsi_transport+0x151()
    scsi_poll+0x7e()
    mptsas_send_scsi_cmd+0xee()
    mptsas_inquiry+0xd9()
    mptsas_get_sata_guid+0x67()
    mptsas_get_target_device_info+0x131()
    mptsas_update_hashtab+0xc0()
    mptsas_config_all+0x89()
    mptsas_bus_config+0x287()
    scsi_hba_bus_config+0xdc()
    devi_config_common+0x94()
    mt_config_thread+0x53()
    thread_start+8()

That's presumably after it successfully issues the hba_tran for mptsas... not sure which thread is supposed to wake up the cv.
#2

Updated by Rich Ercolani almost 8 years ago

Hey, neat.

I discovered this mostly happens when I use drives with 4K sector sizes that are lying/in 512-byte "emulation" mode (and have no mode to change out of it, ACK!) - they drop commands too often, the driver fails to recover, tries resetting the HBA, and we jump off a cliff and never wake up.

Also available in: Atom PDF