mpt_sas IOC reset races can cause panics
There are races present between event handling and the IOC reset path for mpt_sas.
The most common panic resulting from this occurs near the beginning of mptsas_handle_event:
ffffff002f0c8b90 mptsas_handle_event+0x1c(ffffff07450fb020) ffffff002f0c8c40 taskq_thread+0x285(ffffff07306c9280) ffffff002f0c8c50 thread_start+8()
sometimes a stack trace will only show a mutex_enter in the event taskq:
ffffff002ffd4c60 fffffffffbc2dbf0 0 0 60 0 PC: panicsys+0x9b TASKQ: mpt_sas_mptsas_event_taskq stack pointer for thread ffffff002ffd4c60: ffffff002ffd46b0 panic+0x94() die+0xdd() trap+0x177b() 0xfffffffffb8001d6() mutex_enter+0xb() taskq_thread+0x285() thread_start+8()
There are three things which we don't want to run concurrently with an IOC reset attempt: the interrupt handlers, an address reply taskq thread, and a dynamic reconfigure (topology change) taskq thread. These could reference resources reallocated or zeroed in mptsas_init_chip.
We call mptsas_init_chip to process a reset, which reallocates the reply queue that mptsas_handle_event uses an entry from. We must grab mpt->m_mutex before checking the m_in_reset flag, and drop the event if we are in the middle of a reset.
In the solution, we have the interrupt handler exit if it's started in the middle of a reset (interrupts are already disabled, but either the polled or regular interrupt handler could still be running). We wait for the taskq threads to complete before proceeding. We also don't zero the address reply array since it's not necessary.
This was user reported on the developer list:
This was determined to be a duplicate of Nexenta #10443 and Bugster 6966172. We've fixed this in illumos-nexenta:
Updated by Dan McDonald about 7 years ago
- Status changed from Pending RTI to Resolved
Author: Albert Lee <email@example.com>
Date: Fri Aug 31 14:53:16 2012 -0400
3195 mpt_sas IOC reset races can cause panics
Reviewed by: Dan McDonald <firstname.lastname@example.org>
Reviewed by: Keith Wesolowski <email@example.com>
Approved by: Garrett D'Amore <firstname.lastname@example.org>
Updated by Gracie Flora over 2 years ago
Mutex to synchronize ioctl,sysfs demonstrate way and pci asset dealing with. PCI asset liberating will prompt free indispensable equipment/memory asset, which may be being used by cli/sysfs way works bringing about Null pointer reference took after by portion crash. To keep away from the above race condition we utilize mutex syncrhonization which guarantees the syncrhonization between cli/sysfs_show way.
[[Do My Coursework For Me]]https://www.courseworkcamp.co.uk/write-my-coursework