Bug #5297

mptsas refhash replacement on reset can cause hang

Added by Rich Ercolani almost 3 years ago. Updated almost 3 years ago.

Status:ClosedStart date:2014-11-07
Priority:NormalDue date:
Assignee:-% Done:

100%

Category:-
Target version:-
Difficulty:Medium Tags:needs-triage

Description

Want to integrate the one outstanding bugfix in the SmartOS tree for mpt_sas that hasn't made it upstream.

The relevant commit can be found at https://github.com/joyent/illumos-joyent/commit/d8b8daca0baacf10f4535dddb56338d0afc3b335; I'll try to ask il-dev to review a (basically unmodified) respin of the patch against il-gate HEAD, and then kick it off for RTI.

To verbatim copy from its bug report (in the event that one of these becomes unavailable before the other):

From Zhiwen Zhang:

> The scenario may be the following:
> 
> During mptsas_restart_ioc(), we already hold mpt->m_mutex and
> mpt->m_in_reset set to TRUE,
> so new commands cannot be added to active queue and wait queue, but can be
> added to tx wait queue,
> which does not require to hold mpt->m_mutex to be manipulated. So during
> ioc restart, the mptsas_target_t s
> can still be actively used, e.g. cmd->cmd_tgt_addr = ptgt in
> mptsas_scsi_init_pkt().

The IOC reset path does not need to replace the refhash. In addition to causing a memory leak, this approach leads to potential hangs in the case described above. Instead, we should allocate this once in the attach path and then leave it alone.

History

#1 Updated by Rich Ercolani almost 3 years ago

http://cr.illumos.org/~webrev/rincebrain/refhash

Keith's patch just directly applied sanely, so that was simple enough.

Full build running now, then I'll ask -dev for reviewers, try to find an advocate, and then hopefully RTI quickly enough...

#2 Updated by Igor Kozhukhov almost 3 years ago

Rich Ercolani wrote:

http://cr.illumos.org/~webrev/rincebrain/refhash

Keith's patch just directly applied sanely, so that was simple enough.

Full build running now, then I'll ask -dev for reviewers, try to find an advocate, and then hopefully RTI quickly enough...

my +1 here - i have tested with my tree.
thanks for this work.
i thought about it too, but i just wanted to test it first on my env with LSI 9211-8i

#3 Updated by Electric Monk almost 3 years ago

  • % Done changed from 0 to 100
  • Status changed from New to Closed

git commit abdbe11c7877311202d2870b53d7c76264121b2c

commit  abdbe11c7877311202d2870b53d7c76264121b2c
Author: Keith M Wesolowski <wesolows@foobazco.org>
Date:   2014-12-05T19:34:17.000Z

    5297 mptsas refhash replacement on reset can cause hang
    Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
    Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
    Reviewed by: Andy Giles <illumos@ang.homedns.org>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Also available in: Atom