Bug #4403
openmpt_sas panic when pulling a drive
0%
Description
OmniOS r151006 - mpt_sas driver version 0.5.11,5.11-0.151006:20130906T160306Z
> ::stack ddi_dma_mem_free+0x1b(ffffff01eb895748) mptsas_dma_addr_destroy+0x33(ffffff431b493548, ffffff01eb895748) mptsas_access_config_page+0x2a1(ffffff430c9ba000, 1, 16, 0, ff, fffffffff8120b30 ) mptsas_get_raid_info+0x7a(ffffff430c9ba000) mptsas_update_hashtab+0x28(ffffff430c9ba000) mptsas_update_driver_data+0x59(ffffff430c9ba000) mptsas_restart_ioc+0xb3(ffffff430c9ba000) mptsas_ioc_task_management+0x2c2(ffffff430c9ba000, 3, 9, 0, 0, 0) mptsas_do_scsi_reset+0x98(ffffff430c9ba000, 9) mptsas_handle_topo_change+0x3bc(ffffff432752fa00, ffffff43118ee2b0) mptsas_handle_dr+0xa0(ffffff432752fa00) taskq_thread+0x2d0(ffffff4310f71af8) thread_start+8() > ::status debugging crash dump vmcore.2 (64-bit) from storage13 operating system: 5.11 omnios-b281e50 (i86pc) image uuid: 62f75ab9-fd67-4b11-865d-f50770b0563b panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff01eb8954c0 addr=28 occurred in module "genunix" due to a NULL pointer dereference dump content: kernel pages only
This happens when physically pulling a drive from the system.
Updated by Andy Giles over 8 years ago
Analysis of the code shows that this is an inevitable outcome if the call to ddi_dma_mem_alloc() within mptsas_dma_addr_create() fails. This is because the pointer that is set to NULL after that failure is only the local variable, it should be setting the target to NULL. In mptsas_access_config_page() mptsas_dma_addr_destroy() is called even if mptsas_dma_addr_create() fails. If all occurances of dma_hdp = NULL;
were changed to *dma_hdp = NULL;
then the NULL pointer dereference would be avoided. But this begs the question why did ddi_dma_mem_alloc() fail?
From mptsas.c:
int mptsas_dma_addr_create(mptsas_t *mpt, ddi_dma_attr_t dma_attr, ddi_dma_handle_t *dma_hdp, ddi_acc_handle_t *acc_hdp, caddr_t *dma_memp, uint32_t alloc_size, ddi_dma_cookie_t *cookiep) { ddi_dma_cookie_t new_cookie; size_t alloc_len; uint_t ncookie; if (cookiep == NULL) cookiep = &new_cookie; if (ddi_dma_alloc_handle(mpt->m_dip, &dma_attr, DDI_DMA_SLEEP, NULL, dma_hdp) != DDI_SUCCESS) { dma_hdp = NULL; return (FALSE); } if (ddi_dma_mem_alloc(*dma_hdp, alloc_size, &mpt->m_dev_acc_attr, DDI_DMA_CONSISTENT, DDI_DMA_SLEEP, NULL, dma_memp, &alloc_len, acc_hdp) != DDI_SUCCESS) { ddi_dma_free_handle(dma_hdp); dma_hdp = NULL; return (FALSE); } if (ddi_dma_addr_bind_handle(*dma_hdp, NULL, *dma_memp, alloc_len, (DDI_DMA_RDWR | DDI_DMA_CONSISTENT), DDI_DMA_SLEEP, NULL, cookiep, &ncookie) != DDI_DMA_MAPPED) { (void) ddi_dma_mem_free(acc_hdp); ddi_dma_free_handle(dma_hdp); dma_hdp = NULL; return (FALSE); } return (TRUE); } void mptsas_dma_addr_destroy(ddi_dma_handle_t *dma_hdp, ddi_acc_handle_t *acc_hdp) { if (*dma_hdp == NULL) return; (void) ddi_dma_unbind_handle(*dma_hdp); (void) ddi_dma_mem_free(acc_hdp); ddi_dma_free_handle(dma_hdp); dma_hdp = NULL; }
Mark, can you reproduce this every time?
Updated by Mark Harrison over 8 years ago
This is a customer machine, and this is their response:
Yes, it happened every time I pulled out a drive. It also happens every time when a drive appears as "Unavailable" in the zpool.
Updated by Dan McDonald over 8 years ago
Andy's analysis is only partially correct.
The end of mptsas_access_config_page(), labelled by "page_done", always calls mptsas_dma_addr_destroy(), regardless of how it gets to page_done.
I see:
- An allocation failure up front (line 323 of mptsas_impl.c) goes to page_done, regardless of cmd being null or not, and with accessp uninitialized
- Timed out command (line 382), while cmd is now not NULL, accessp is STILL unintialized.
- iocstatus failures (line 418), still accessp is uninitialized.
- Packet is reset (line 433), still accessp is uninitialized.
- FINALLY we get to line 468 and the failure mode Andy mentioned. accessp should be set to 0, not &accessp like is currently, BUT even if it's set correctly, things will fail.
Quite frankly, I see the following being the best fix:
1.) Initialize "accessp" to 0.
2.) Fix the failure mode Andy indicates.
3.) mptsas_dma_addr_destroy() needs to check *acc_hdp for NULL and NOT call ddi_dma_mem_free() if it is.
Patch coming.
Updated by Dan McDonald over 8 years ago
Here's a dtrace one-liner to detect this error. You may not see it prior to the coredump, but it'll be in ::dtrace state in the dump.
dtrace -n 'mptsas_dma_addr_destroy:entry /arg0 != NULL && arg1 == NULL/ {printf("GOTCHA!\\n"); stack();}'
Updated by Dan McDonald over 8 years ago
Proposed first patch:
omnios-stable(~/ws/omnios-4403)[0]% git whatchanged -p origin/master.. commit 5f2cb2a15c4ed257bb98ca2a31643b2dd2807eb3 Author: Dan McDonald <danmcd@omniti.com> Date: Fri Feb 21 04:53:04 2014 -0500 4403 mpt_sas panic when pulling a drive diff --git a/usr/src/uts/common/io/scsi/adapters/mpt_sas/mptsas.c b/usr/src/uts/ index 9ad7c41..33828da 100644 --- a/usr/src/uts/common/io/scsi/adapters/mpt_sas/mptsas.c +++ b/usr/src/uts/common/io/scsi/adapters/mpt_sas/mptsas.c @@ -15444,7 +15444,7 @@ mptsas_dma_addr_create(mptsas_t *mpt, ddi_dma_attr_t dma if (ddi_dma_alloc_handle(mpt->m_dip, &dma_attr, DDI_DMA_SLEEP, NULL, dma_hdp) != DDI_SUCCESS) { - dma_hdp = NULL; + *dma_hdp = NULL; return (FALSE); } @@ -15452,7 +15452,7 @@ mptsas_dma_addr_create(mptsas_t *mpt, ddi_dma_attr_t dma DDI_DMA_CONSISTENT, DDI_DMA_SLEEP, NULL, dma_memp, &alloc_len, acc_hdp) != DDI_SUCCESS) { ddi_dma_free_handle(dma_hdp); - dma_hdp = NULL; + *dma_hdp = NULL; return (FALSE); } @@ -15460,8 +15460,9 @@ mptsas_dma_addr_create(mptsas_t *mpt, ddi_dma_attr_t dma (DDI_DMA_RDWR | DDI_DMA_CONSISTENT), DDI_DMA_SLEEP, NULL, cookiep, &ncookie) != DDI_DMA_MAPPED) { (void) ddi_dma_mem_free(acc_hdp); + *acc_hdp = NULL; ddi_dma_free_handle(dma_hdp); - dma_hdp = NULL; + *dma_hdp = NULL; return (FALSE); } @@ -15475,7 +15476,8 @@ mptsas_dma_addr_destroy(ddi_dma_handle_t *dma_hdp, ddi_a return; (void) ddi_dma_unbind_handle(*dma_hdp); - (void) ddi_dma_mem_free(acc_hdp); + if (*acc_hdp != NULL) + (void) ddi_dma_mem_free(acc_hdp); ddi_dma_free_handle(dma_hdp); - dma_hdp = NULL; + *dma_hdp = NULL; } diff --git a/usr/src/uts/common/io/scsi/adapters/mpt_sas/mptsas_impl.c b/usr/src index 9f1bf71..2971ad0 100644 --- a/usr/src/uts/common/io/scsi/adapters/mpt_sas/mptsas_impl.c +++ b/usr/src/uts/common/io/scsi/adapters/mpt_sas/mptsas_impl.c @@ -299,7 +299,7 @@ mptsas_access_config_page(mptsas_t *mpt, uint8_t action, uin va_list ap; ddi_dma_attr_t attrs; ddi_dma_cookie_t cookie; - ddi_acc_handle_t accessp; + ddi_acc_handle_t accessp = NULL; size_t len = 0; mptsas_config_request_t config; int rval = DDI_SUCCESS, config_flags = 0;
Updated by Andy Giles over 8 years ago
Looks good Dan. To be absolutely sure about valid access handle pointer it might be worthwhile clearing the acc_hdp if things fail. mptsas_dma_addr_create/destroy() would then look like:
int mptsas_dma_addr_create(mptsas_t *mpt, ddi_dma_attr_t dma_attr, ddi_dma_handle_t *dma_hdp, ddi_acc_handle_t *acc_hdp, caddr_t *dma_memp, uint32_t alloc_size, ddi_dma_cookie_t *cookiep) { ddi_dma_cookie_t new_cookie; size_t alloc_len; uint_t ncookie; if (cookiep == NULL) cookiep = &new_cookie; if (ddi_dma_alloc_handle(mpt->m_dip, &dma_attr, DDI_DMA_SLEEP, NULL, dma_hdp) != DDI_SUCCESS) { *dma_hdp = NULL; *acc_hdp = NULL; return (FALSE); } if (ddi_dma_mem_alloc(*dma_hdp, alloc_size, &mpt->m_dev_acc_attr, DDI_DMA_CONSISTENT, DDI_DMA_SLEEP, NULL, dma_memp, &alloc_len, acc_hdp) != DDI_SUCCESS) { ddi_dma_free_handle(dma_hdp); *dma_hdp = NULL; *acc_hdp = NULL; return (FALSE); } if (ddi_dma_addr_bind_handle(*dma_hdp, NULL, *dma_memp, alloc_len, (DDI_DMA_RDWR | DDI_DMA_CONSISTENT), DDI_DMA_SLEEP, NULL, cookiep, &ncookie) != DDI_DMA_MAPPED) { (void) ddi_dma_mem_free(acc_hdp); ddi_dma_free_handle(dma_hdp); *dma_hdp = NULL; *acc_hdp = NULL; return (FALSE); } return (TRUE); } void mptsas_dma_addr_destroy(ddi_dma_handle_t *dma_hdp, ddi_acc_handle_t *acc_hdp) { if (*dma_hdp == NULL) return; (void) ddi_dma_unbind_handle(*dma_hdp); if (*acc_hdp != NULL) { (void) ddi_dma_mem_free(acc_hdp); *acc_hdp = NULL; } ddi_dma_free_handle(dma_hdp); *dma_hdp = NULL; }
That wouldn't prevent a problem if _destroy() is called without having previously called _create() as you found in mptsas_access_config_page() but might help other failure code paths.
Updated by Electric Monk over 8 years ago
git commit f7d0d869a9ae78d8244009efb4980a1f057bc8f8
Author: Dan McDonald <danmcd@omniti.com> 4403 mpt_sas panic when pulling a drive Reviewed by: Hans Rosenfeld <hans.rosenfeld@nexenta.com> Reviewed by: Albert Lee <trisk@nexenta.com> Reviewed by: Andy Giles <illumos@ang.homedns.org> Approved by: Robert Mustacchi <rm@joyent.com>