Bug #4403

mpt_sas panic when pulling a drive

Added by Mark Harrison almost 4 years ago. Updated over 3 years ago.

Status:NewStart date:2013-12-16
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:driver - device drivers
Target version:-
Difficulty:Medium Tags:needs-triage

Description

OmniOS r151006 - mpt_sas driver version 0.5.11,5.11-0.151006:20130906T160306Z

> ::stack
ddi_dma_mem_free+0x1b(ffffff01eb895748)
mptsas_dma_addr_destroy+0x33(ffffff431b493548, ffffff01eb895748)
mptsas_access_config_page+0x2a1(ffffff430c9ba000, 1, 16, 0, ff, fffffffff8120b30
)
mptsas_get_raid_info+0x7a(ffffff430c9ba000)
mptsas_update_hashtab+0x28(ffffff430c9ba000)
mptsas_update_driver_data+0x59(ffffff430c9ba000)
mptsas_restart_ioc+0xb3(ffffff430c9ba000)
mptsas_ioc_task_management+0x2c2(ffffff430c9ba000, 3, 9, 0, 0, 0)
mptsas_do_scsi_reset+0x98(ffffff430c9ba000, 9)
mptsas_handle_topo_change+0x3bc(ffffff432752fa00, ffffff43118ee2b0)
mptsas_handle_dr+0xa0(ffffff432752fa00)
taskq_thread+0x2d0(ffffff4310f71af8)
thread_start+8()
> ::status
debugging crash dump vmcore.2 (64-bit) from storage13
operating system: 5.11 omnios-b281e50 (i86pc)
image uuid: 62f75ab9-fd67-4b11-865d-f50770b0563b
panic message:
BAD TRAP: type=e (#pf Page fault) rp=ffffff01eb8954c0 addr=28 occurred in module
 "genunix" due to a NULL pointer dereference
dump content: kernel pages only

This happens when physically pulling a drive from the system.

History

#1 Updated by Andy Giles almost 4 years ago

Analysis of the code shows that this is an inevitable outcome if the call to ddi_dma_mem_alloc() within mptsas_dma_addr_create() fails. This is because the pointer that is set to NULL after that failure is only the local variable, it should be setting the target to NULL. In mptsas_access_config_page() mptsas_dma_addr_destroy() is called even if mptsas_dma_addr_create() fails. If all occurances of dma_hdp = NULL; were changed to *dma_hdp = NULL; then the NULL pointer dereference would be avoided. But this begs the question why did ddi_dma_mem_alloc() fail?

From mptsas.c:

int
mptsas_dma_addr_create(mptsas_t *mpt, ddi_dma_attr_t dma_attr,
    ddi_dma_handle_t *dma_hdp, ddi_acc_handle_t *acc_hdp, caddr_t *dma_memp,
    uint32_t alloc_size, ddi_dma_cookie_t *cookiep)
{
    ddi_dma_cookie_t    new_cookie;
    size_t            alloc_len;
    uint_t            ncookie;

    if (cookiep == NULL)
        cookiep = &new_cookie;

    if (ddi_dma_alloc_handle(mpt->m_dip, &dma_attr, DDI_DMA_SLEEP,
        NULL, dma_hdp) != DDI_SUCCESS) {
        dma_hdp = NULL;
        return (FALSE);
    }

    if (ddi_dma_mem_alloc(*dma_hdp, alloc_size, &mpt->m_dev_acc_attr,
        DDI_DMA_CONSISTENT, DDI_DMA_SLEEP, NULL, dma_memp, &alloc_len,
        acc_hdp) != DDI_SUCCESS) {
        ddi_dma_free_handle(dma_hdp);
        dma_hdp = NULL;
        return (FALSE);
    }

    if (ddi_dma_addr_bind_handle(*dma_hdp, NULL, *dma_memp, alloc_len,
        (DDI_DMA_RDWR | DDI_DMA_CONSISTENT), DDI_DMA_SLEEP, NULL,
        cookiep, &ncookie) != DDI_DMA_MAPPED) {
        (void) ddi_dma_mem_free(acc_hdp);
        ddi_dma_free_handle(dma_hdp);
        dma_hdp = NULL;
        return (FALSE);
    }

    return (TRUE);
}

void
mptsas_dma_addr_destroy(ddi_dma_handle_t *dma_hdp, ddi_acc_handle_t *acc_hdp)
{
    if (*dma_hdp == NULL)
        return;

    (void) ddi_dma_unbind_handle(*dma_hdp);
    (void) ddi_dma_mem_free(acc_hdp);
    ddi_dma_free_handle(dma_hdp);
    dma_hdp = NULL;
}

Mark, can you reproduce this every time?

#2 Updated by Mark Harrison almost 4 years ago

This is a customer machine, and this is their response:

Yes, it happened every time I pulled out a drive. It also happens every time when a drive appears as "Unavailable" in the zpool.

#3 Updated by Dan McDonald over 3 years ago

Andy's analysis is only partially correct.

The end of mptsas_access_config_page(), labelled by "page_done", always calls mptsas_dma_addr_destroy(), regardless of how it gets to page_done.

I see:

  • An allocation failure up front (line 323 of mptsas_impl.c) goes to page_done, regardless of cmd being null or not, and with accessp uninitialized
  • Timed out command (line 382), while cmd is now not NULL, accessp is STILL unintialized.
  • iocstatus failures (line 418), still accessp is uninitialized.
  • Packet is reset (line 433), still accessp is uninitialized.
  • FINALLY we get to line 468 and the failure mode Andy mentioned. accessp should be set to 0, not &accessp like is currently, BUT even if it's set correctly, things will fail.

Quite frankly, I see the following being the best fix:

1.) Initialize "accessp" to 0.

2.) Fix the failure mode Andy indicates.

3.) mptsas_dma_addr_destroy() needs to check *acc_hdp for NULL and NOT call ddi_dma_mem_free() if it is.

Patch coming.

#4 Updated by Dan McDonald over 3 years ago

Here's a dtrace one-liner to detect this error. You may not see it prior to the coredump, but it'll be in ::dtrace state in the dump.

dtrace -n 'mptsas_dma_addr_destroy:entry /arg0 != NULL && arg1 == NULL/ {printf("GOTCHA!\\n"); stack();}'

#5 Updated by Dan McDonald over 3 years ago

Proposed first patch:

omnios-stable(~/ws/omnios-4403)[0]% git whatchanged -p origin/master..
commit 5f2cb2a15c4ed257bb98ca2a31643b2dd2807eb3
Author: Dan McDonald <danmcd@omniti.com>
Date:   Fri Feb 21 04:53:04 2014 -0500

    4403 mpt_sas panic when pulling a drive

diff --git a/usr/src/uts/common/io/scsi/adapters/mpt_sas/mptsas.c b/usr/src/uts/
index 9ad7c41..33828da 100644
--- a/usr/src/uts/common/io/scsi/adapters/mpt_sas/mptsas.c
+++ b/usr/src/uts/common/io/scsi/adapters/mpt_sas/mptsas.c
@@ -15444,7 +15444,7 @@ mptsas_dma_addr_create(mptsas_t *mpt, ddi_dma_attr_t dma

        if (ddi_dma_alloc_handle(mpt->m_dip, &dma_attr, DDI_DMA_SLEEP,
            NULL, dma_hdp) != DDI_SUCCESS) {
-               dma_hdp = NULL;
+               *dma_hdp = NULL;
                return (FALSE);
        }

@@ -15452,7 +15452,7 @@ mptsas_dma_addr_create(mptsas_t *mpt, ddi_dma_attr_t dma
            DDI_DMA_CONSISTENT, DDI_DMA_SLEEP, NULL, dma_memp, &alloc_len,
            acc_hdp) != DDI_SUCCESS) {
                ddi_dma_free_handle(dma_hdp);
-               dma_hdp = NULL;
+               *dma_hdp = NULL;
                return (FALSE);
        }

@@ -15460,8 +15460,9 @@ mptsas_dma_addr_create(mptsas_t *mpt, ddi_dma_attr_t dma
            (DDI_DMA_RDWR | DDI_DMA_CONSISTENT), DDI_DMA_SLEEP, NULL,
            cookiep, &ncookie) != DDI_DMA_MAPPED) {
                (void) ddi_dma_mem_free(acc_hdp);
+               *acc_hdp = NULL;
                ddi_dma_free_handle(dma_hdp);
-               dma_hdp = NULL;
+               *dma_hdp = NULL;
                return (FALSE);
        }

@@ -15475,7 +15476,8 @@ mptsas_dma_addr_destroy(ddi_dma_handle_t *dma_hdp, ddi_a
                return;

        (void) ddi_dma_unbind_handle(*dma_hdp);
-       (void) ddi_dma_mem_free(acc_hdp);
+       if (*acc_hdp != NULL)
+               (void) ddi_dma_mem_free(acc_hdp);
        ddi_dma_free_handle(dma_hdp);
-       dma_hdp = NULL;
+       *dma_hdp = NULL;
 }
diff --git a/usr/src/uts/common/io/scsi/adapters/mpt_sas/mptsas_impl.c b/usr/src
index 9f1bf71..2971ad0 100644
--- a/usr/src/uts/common/io/scsi/adapters/mpt_sas/mptsas_impl.c
+++ b/usr/src/uts/common/io/scsi/adapters/mpt_sas/mptsas_impl.c
@@ -299,7 +299,7 @@ mptsas_access_config_page(mptsas_t *mpt, uint8_t action, uin
        va_list                 ap;
        ddi_dma_attr_t          attrs;
        ddi_dma_cookie_t        cookie;
-       ddi_acc_handle_t        accessp;
+       ddi_acc_handle_t        accessp = NULL;
        size_t                  len = 0;
        mptsas_config_request_t config;
        int                     rval = DDI_SUCCESS, config_flags = 0;

#6 Updated by Andy Giles over 3 years ago

Looks good Dan. To be absolutely sure about valid access handle pointer it might be worthwhile clearing the acc_hdp if things fail. mptsas_dma_addr_create/destroy() would then look like:

int
mptsas_dma_addr_create(mptsas_t *mpt, ddi_dma_attr_t dma_attr,
    ddi_dma_handle_t *dma_hdp, ddi_acc_handle_t *acc_hdp, caddr_t *dma_memp,
    uint32_t alloc_size, ddi_dma_cookie_t *cookiep)
{
    ddi_dma_cookie_t    new_cookie;
    size_t            alloc_len;
    uint_t            ncookie;

    if (cookiep == NULL)
        cookiep = &new_cookie;

    if (ddi_dma_alloc_handle(mpt->m_dip, &dma_attr, DDI_DMA_SLEEP,
        NULL, dma_hdp) != DDI_SUCCESS) {
        *dma_hdp = NULL;
        *acc_hdp = NULL;
        return (FALSE);
    }

    if (ddi_dma_mem_alloc(*dma_hdp, alloc_size, &mpt->m_dev_acc_attr,
        DDI_DMA_CONSISTENT, DDI_DMA_SLEEP, NULL, dma_memp, &alloc_len,
        acc_hdp) != DDI_SUCCESS) {
        ddi_dma_free_handle(dma_hdp);
        *dma_hdp = NULL;
        *acc_hdp = NULL;
        return (FALSE);
    }

    if (ddi_dma_addr_bind_handle(*dma_hdp, NULL, *dma_memp, alloc_len,
        (DDI_DMA_RDWR | DDI_DMA_CONSISTENT), DDI_DMA_SLEEP, NULL,
        cookiep, &ncookie) != DDI_DMA_MAPPED) {
        (void) ddi_dma_mem_free(acc_hdp);
        ddi_dma_free_handle(dma_hdp);
        *dma_hdp = NULL;
        *acc_hdp = NULL;
        return (FALSE);
    }

    return (TRUE);
}

void
mptsas_dma_addr_destroy(ddi_dma_handle_t *dma_hdp, ddi_acc_handle_t *acc_hdp)
{
    if (*dma_hdp == NULL)
        return;

    (void) ddi_dma_unbind_handle(*dma_hdp);
    if (*acc_hdp != NULL) {
        (void) ddi_dma_mem_free(acc_hdp);
        *acc_hdp = NULL;
    }
    ddi_dma_free_handle(dma_hdp);
    *dma_hdp = NULL;
}

That wouldn't prevent a problem if _destroy() is called without having previously called _create() as you found in mptsas_access_config_page() but might help other failure code paths.

#7 Updated by Electric Monk over 3 years ago

git commit f7d0d869a9ae78d8244009efb4980a1f057bc8f8

Author: Dan McDonald <danmcd@omniti.com>

4403 mpt_sas panic when pulling a drive
Reviewed by: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
Reviewed by: Albert Lee <trisk@nexenta.com>
Reviewed by: Andy Giles <illumos@ang.homedns.org>
Approved by: Robert Mustacchi <rm@joyent.com>

Also available in: Atom