Project

General

Profile

Bug #2911

blkdev-related panic on DEBUG kernels

Added by Alexey Zaytsev over 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
driver - device drivers
Start date:
2012-06-22
Due date:
% Done:

100%

Estimated time:
Difficulty:
Hard
Tags:
Gerrit CR:

Description

When trying to do an illumos build on a DEBUG kernel, I'm getting this a few hours into the build:

panic[cpu0]/thread=ffffff01f0dc2880: assertion failed: rc == -1, file: ../../common/os/kmem.c, line: 2598

ffffff0008ad6280 genunix:assfail+73 ()
ffffff0008ad62e0 genunix:kmem_cache_alloc+324 ()
ffffff0008ad6360 blkdev:bd_xfer_alloc+45 ()
ffffff0008ad63b0 blkdev:bd_strategy+140 ()
ffffff0008ad63e0 genunix:bdev_strategy+75 ()
ffffff0008ad6410 genunix:ldi_strategy+59 ()
ffffff0008ad6450 zfs:vdev_disk_io_start+d0 ()
ffffff0008ad64a0 zfs:zio_vdev_io_start+255 ()
ffffff0008ad64e0 zfs:zio_execute+ca ()
ffffff0008ad6500 zfs:zio_nowait+4d () 
ffffff0008ad6540 zfs:vdev_mirror_io_start+153 ()
ffffff0008ad6590 zfs:zio_vdev_io_start+29f ()
ffffff0008ad65d0 zfs:zio_execute+ca ()
ffffff0008ad65f0 zfs:zio_nowait+4d () 
ffffff0008ad6690 zfs:arc_read_nolock+91f ()
ffffff0008ad6730 zfs:arc_read+ae ()   
ffffff0008ad6790 zfs:dsl_read+33 ()   
ffffff0008ad6820 zfs:dbuf_prefetch+14a ()
ffffff0008ad6870 zfs:dmu_zfetch_fetch+62 ()
ffffff0008ad68b0 zfs:dmu_zfetch_dofetch+b8 ()
ffffff0008ad6910 zfs:dmu_zfetch_find+436 ()
ffffff0008ad69d0 zfs:dmu_zfetch+ac () 
ffffff0008ad6a30 zfs:dbuf_read+141 () 
ffffff0008ad6af0 zfs:dmu_buf_hold_array_by_dnode+1d9 ()
ffffff0008ad6b80 zfs:dmu_buf_hold_array+6e ()
ffffff0008ad6c30 zfs:dmu_read_uio+4e ()
ffffff0008ad6cd0 zfs:zfs_read+269 ()  
ffffff0008ad6d40 genunix:fop_read+6b ()
ffffff0008ad6e80 genunix:read+2b8 ()  
ffffff0008ad6eb0 genunix:read32+22 () 
ffffff0008ad6f00 unix:brand_sys_sysenter+2b7 ()

panic: entering debugger (continue to save dump)

It seems to be failing in this code:

kmem.c:kmem_cache_alloc()
[...]
        if (cp->cache_flags & KMF_BUFTAG) {
                /*
                 * Make kmem_cache_alloc_debug() apply the constructor for us.
                 */
                int rc = kmem_cache_alloc_debug(cp, buf, kmflag, 1, caller());
                if (rc != 0) {
                        if (kmflag & KM_NOSLEEP)
                                return (NULL);
                        /*
                         * kmem_cache_alloc_debug() detected corruption
                         * but didn't panic (kmem_panic <= 0). We should not be
                         * here because the constructor failed (indicated by a
                         * return code of 1). Try again.
                         */
                        ASSERT(rc == -1);
                        return (kmem_cache_alloc(cp, kmflag));

As the comments indicate, it's probably caused by the constructor failure:

blkdev.c:bd_xfer_ctor()
[....]
        if (bd->d_use_dma) {
                if (ddi_dma_alloc_handle(bd->d_dip, &bd->d_dma, dcb, NULL,
                    &xi->i_dmah) != DDI_SUCCESS) {
                        return (-1);
                }
        }

Anyone up to looking into this?

I've tried running ::findleaks, just in case, but it fails due to #2907.

#1

Updated by Garrett D'Amore over 8 years ago

Reading the code, it appears that it is ill equipped to deal with constructor failure here. That's a problem. We can indeed reasonably fail the constructor, and in this case the problem is a failure to alloc a DMA handle.

However, reading bd_xfer_ctor indicates that the DMA allocation should be performed with DDI_DMA_SLEEP. So why is the DMA handle allocation failing? It never should. Perhaps you have corrupted DMA attributes somewhere? Or perhaps all DMA resources on the system are exhausted?

#2

Updated by Alexey Zaytsev over 8 years ago

Hmm, I don't think you are right.

bd_xfer_alloc() is called from bd_strategy():

        xi = bd_xfer_alloc(bd, bp, func, KM_NOSLEEP);
        if (xi == NULL) {
                xi = bd_xfer_alloc(bd, bp, func, KM_PUSHPAGE);
        }

And bd_xfer_ctor() says:

        if (kmflag == KM_SLEEP) {
                dcb = DDI_DMA_SLEEP;
        } else {
                dcb = DDI_DMA_DONTWAIT;
        }

So ddi_dma_alloc_handle is called with dcb == DDI_DMA_DONTWAIT. Right?

#3

Updated by Alexey Zaytsev over 8 years ago

So, how should we handle this? Is it possible that a KM_SLEEP allocation would cause some pages to be swapped out, thus possibly causing more requests to the same device?

In Linux, they solve the problem using a wider range of GFP flags (analogous to KM flags):
GFP_NOWAIT - Don't sleep.
GFP_NOIO - Can sleep, but don't do any IO on the block layer. Useful in block layer and driver code.
GFP_NOFS - Can sleep and do block IO, but not fs access (can swap, unless the swap is backed by a file). Useful in fs code.
GFP_KERNEL - Can do anything, the default most code should use.
As well as a few others.

We don't have such an abundance in illumos, so I'm not sure how to solve the problem without causing any deadlocks, assuming KM_SLEEP could cause new IO.

#4

Updated by Garrett D'Amore over 8 years ago

I think the problem is that bd_xfer_ctor() needs to consider KM_PUSHPAGE as well as KM_SLEEP. (KM_PUSHPAGE has similar semantics as KM_SLEEP in that it should trigger a sleeping DDI allocation as well.)

#5

Updated by Alexey Zaytsev over 8 years ago

The ctor would fail from the first allocation that is using KM_NOSLEEP:

        xi = bd_xfer_alloc(bd, bp, func, KM_NOSLEEP);
        if (xi == NULL) {
                xi = bd_xfer_alloc(bd, bp, func, KM_PUSHPAGE);
        }

We could only leave the KM_PUSHPAGE allocation, and map KM_PUSHPAGE to DDI_DMA_SLEEP, but could this cause more IO to the same device?

#6

Updated by Garrett D'Amore over 8 years ago

No, you misunderstand. KM_NOSLEEP should still be fine. And it is. The problem is that we used KM_PUSHPAGE instead of KM_SLEEP, but the underlying DMA logic doesn't notice that KM_PUSHPAGE was used and fails to apply the correct DDI_DMA_SLEEP setting, I think.

#7

Updated by Alexey Zaytsev over 8 years ago

The first call is still made with KM_NOSLEEP, so the ctor would still fail on the first call, and cause the assert in kmem_cache_alloc() to fail, right?

#8

Updated by Garrett D'Amore over 8 years ago

No. KM_NOSLEEP allocations won't be in that code path. Or rather, the rc will be -1, indicating a KM_NOSLEEP failed allocation, not a failure of the constructor. (So the assert won't fire if KM_NOSLEEP fails.)

#9

Updated by Alexey Zaytsev over 8 years ago

Got it. kmem_cache_alloc only expects the constructor to always succeed when KM_NOSLEEP is not set. Let me fix blkdev.

#10

Updated by Alexey Zaytsev over 8 years ago

  • Subject changed from blkdev-related panic on DEBUG kernels. to blkdev-related panic on DEBUG kernels
#11

Updated by Garrett D'Amore about 8 years ago

  • Category changed from kernel to driver - device drivers
  • Status changed from New to Resolved
  • % Done changed from 0 to 100
  • Tags deleted (needs-triage)

Resolved in

commit 679ac1565a070e343ccb5d6dcff1231cc6011ce4
Author: Alexey Zaytsev <>
Date: Tue Oct 16 03:33:44 2012 -0700

2911 blkdev-related panic on DEBUG kernels
Reviewed by: Richard Lowe &lt;&gt;
Reviewed by: Garrett D'Amore &lt;&gt;
Approved by: Garrett D'Amore &lt;&gt;

Also available in: Atom PDF