Bug #3195
closedmpt_sas IOC reset races can cause panics
90%
Description
There are races present between event handling and the IOC reset path for mpt_sas.
The most common panic resulting from this occurs near the beginning of mptsas_handle_event:
ffffff002f0c8b90 mptsas_handle_event+0x1c(ffffff07450fb020) ffffff002f0c8c40 taskq_thread+0x285(ffffff07306c9280) ffffff002f0c8c50 thread_start+8()
sometimes a stack trace will only show a mutex_enter in the event taskq:
ffffff002ffd4c60 fffffffffbc2dbf0 0 0 60 0 PC: panicsys+0x9b TASKQ: mpt_sas_mptsas_event_taskq stack pointer for thread ffffff002ffd4c60: ffffff002ffd46b0 panic+0x94() die+0xdd() trap+0x177b() 0xfffffffffb8001d6() mutex_enter+0xb() taskq_thread+0x285() thread_start+8()
There are three things which we don't want to run concurrently with an IOC reset attempt: the interrupt handlers, an address reply taskq thread, and a dynamic reconfigure (topology change) taskq thread. These could reference resources reallocated or zeroed in mptsas_init_chip.
We call mptsas_init_chip to process a reset, which reallocates the reply queue that mptsas_handle_event uses an entry from. We must grab mpt->m_mutex before checking the m_in_reset flag, and drop the event if we are in the middle of a reset.
In the solution, we have the interrupt handler exit if it's started in the middle of a reset (interrupts are already disabled, but either the polled or regular interrupt handler could still be running). We wait for the taskq threads to complete before proceeding. We also don't zero the address reply array since it's not necessary.
This was user reported on the developer list:
http://comments.gmane.org/gmane.os.illumos.devel/8922
This was determined to be a duplicate of Nexenta #10443 and Bugster 6966172. We've fixed this in illumos-nexenta:
https://bitbucket.org/nexenta/illumos-nexenta/changeset/36c6ca86c98e0b33ce8516c391c3e3c1b63371ee
Related issues
Updated by Dan McDonald almost 10 years ago
- Status changed from In Progress to Pending RTI
Updated by Dan McDonald almost 10 years ago
- Status changed from Pending RTI to Resolved
commit 018d3f06fe63d3b8316ef73502fb8f2dd473ffd1
Author: Albert Lee <trisk@nexenta.com>
Date: Fri Aug 31 14:53:16 2012 -0400
3195 mpt_sas IOC reset races can cause panics
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Reviewed by: Keith Wesolowski <keith.wesolowski@joyent.com>
Approved by: Garrett D'Amore <garrett.damore@gmail.com>
Updated by Arne Jansen about 8 years ago
- Related to Bug #6256: mptsas: deadlock in mptsas_handle_topo_change added
Updated by Marcel Telka over 6 years ago
- Related to Bug #4399: mpt_sas should not do tasw added