Bug #2911
closedblkdev-related panic on DEBUG kernels
100%
Description
When trying to do an illumos build on a DEBUG kernel, I'm getting this a few hours into the build:
panic[cpu0]/thread=ffffff01f0dc2880: assertion failed: rc == -1, file: ../../common/os/kmem.c, line: 2598 ffffff0008ad6280 genunix:assfail+73 () ffffff0008ad62e0 genunix:kmem_cache_alloc+324 () ffffff0008ad6360 blkdev:bd_xfer_alloc+45 () ffffff0008ad63b0 blkdev:bd_strategy+140 () ffffff0008ad63e0 genunix:bdev_strategy+75 () ffffff0008ad6410 genunix:ldi_strategy+59 () ffffff0008ad6450 zfs:vdev_disk_io_start+d0 () ffffff0008ad64a0 zfs:zio_vdev_io_start+255 () ffffff0008ad64e0 zfs:zio_execute+ca () ffffff0008ad6500 zfs:zio_nowait+4d () ffffff0008ad6540 zfs:vdev_mirror_io_start+153 () ffffff0008ad6590 zfs:zio_vdev_io_start+29f () ffffff0008ad65d0 zfs:zio_execute+ca () ffffff0008ad65f0 zfs:zio_nowait+4d () ffffff0008ad6690 zfs:arc_read_nolock+91f () ffffff0008ad6730 zfs:arc_read+ae () ffffff0008ad6790 zfs:dsl_read+33 () ffffff0008ad6820 zfs:dbuf_prefetch+14a () ffffff0008ad6870 zfs:dmu_zfetch_fetch+62 () ffffff0008ad68b0 zfs:dmu_zfetch_dofetch+b8 () ffffff0008ad6910 zfs:dmu_zfetch_find+436 () ffffff0008ad69d0 zfs:dmu_zfetch+ac () ffffff0008ad6a30 zfs:dbuf_read+141 () ffffff0008ad6af0 zfs:dmu_buf_hold_array_by_dnode+1d9 () ffffff0008ad6b80 zfs:dmu_buf_hold_array+6e () ffffff0008ad6c30 zfs:dmu_read_uio+4e () ffffff0008ad6cd0 zfs:zfs_read+269 () ffffff0008ad6d40 genunix:fop_read+6b () ffffff0008ad6e80 genunix:read+2b8 () ffffff0008ad6eb0 genunix:read32+22 () ffffff0008ad6f00 unix:brand_sys_sysenter+2b7 () panic: entering debugger (continue to save dump)
It seems to be failing in this code:
kmem.c:kmem_cache_alloc() [...] if (cp->cache_flags & KMF_BUFTAG) { /* * Make kmem_cache_alloc_debug() apply the constructor for us. */ int rc = kmem_cache_alloc_debug(cp, buf, kmflag, 1, caller()); if (rc != 0) { if (kmflag & KM_NOSLEEP) return (NULL); /* * kmem_cache_alloc_debug() detected corruption * but didn't panic (kmem_panic <= 0). We should not be * here because the constructor failed (indicated by a * return code of 1). Try again. */ ASSERT(rc == -1); return (kmem_cache_alloc(cp, kmflag));
As the comments indicate, it's probably caused by the constructor failure:
blkdev.c:bd_xfer_ctor() [....] if (bd->d_use_dma) { if (ddi_dma_alloc_handle(bd->d_dip, &bd->d_dma, dcb, NULL, &xi->i_dmah) != DDI_SUCCESS) { return (-1); } }
Anyone up to looking into this?
I've tried running ::findleaks, just in case, but it fails due to #2907.
Updated by Garrett D'Amore almost 10 years ago
Reading the code, it appears that it is ill equipped to deal with constructor failure here. That's a problem. We can indeed reasonably fail the constructor, and in this case the problem is a failure to alloc a DMA handle.
However, reading bd_xfer_ctor indicates that the DMA allocation should be performed with DDI_DMA_SLEEP. So why is the DMA handle allocation failing? It never should. Perhaps you have corrupted DMA attributes somewhere? Or perhaps all DMA resources on the system are exhausted?
Updated by Alexey Zaytsev almost 10 years ago
Hmm, I don't think you are right.
bd_xfer_alloc() is called from bd_strategy():
xi = bd_xfer_alloc(bd, bp, func, KM_NOSLEEP); if (xi == NULL) { xi = bd_xfer_alloc(bd, bp, func, KM_PUSHPAGE); }
And bd_xfer_ctor() says:
if (kmflag == KM_SLEEP) { dcb = DDI_DMA_SLEEP; } else { dcb = DDI_DMA_DONTWAIT; }
So ddi_dma_alloc_handle is called with dcb == DDI_DMA_DONTWAIT. Right?
Updated by Alexey Zaytsev almost 10 years ago
So, how should we handle this? Is it possible that a KM_SLEEP allocation would cause some pages to be swapped out, thus possibly causing more requests to the same device?
In Linux, they solve the problem using a wider range of GFP flags (analogous to KM flags):
GFP_NOWAIT - Don't sleep.
GFP_NOIO - Can sleep, but don't do any IO on the block layer. Useful in block layer and driver code.
GFP_NOFS - Can sleep and do block IO, but not fs access (can swap, unless the swap is backed by a file). Useful in fs code.
GFP_KERNEL - Can do anything, the default most code should use.
As well as a few others.
We don't have such an abundance in illumos, so I'm not sure how to solve the problem without causing any deadlocks, assuming KM_SLEEP could cause new IO.
Updated by Garrett D'Amore almost 10 years ago
I think the problem is that bd_xfer_ctor() needs to consider KM_PUSHPAGE as well as KM_SLEEP. (KM_PUSHPAGE has similar semantics as KM_SLEEP in that it should trigger a sleeping DDI allocation as well.)
Updated by Alexey Zaytsev almost 10 years ago
The ctor would fail from the first allocation that is using KM_NOSLEEP:
xi = bd_xfer_alloc(bd, bp, func, KM_NOSLEEP); if (xi == NULL) { xi = bd_xfer_alloc(bd, bp, func, KM_PUSHPAGE); }
We could only leave the KM_PUSHPAGE allocation, and map KM_PUSHPAGE to DDI_DMA_SLEEP, but could this cause more IO to the same device?
Updated by Garrett D'Amore almost 10 years ago
No, you misunderstand. KM_NOSLEEP should still be fine. And it is. The problem is that we used KM_PUSHPAGE instead of KM_SLEEP, but the underlying DMA logic doesn't notice that KM_PUSHPAGE was used and fails to apply the correct DDI_DMA_SLEEP setting, I think.
Updated by Alexey Zaytsev almost 10 years ago
The first call is still made with KM_NOSLEEP, so the ctor would still fail on the first call, and cause the assert in kmem_cache_alloc() to fail, right?
Updated by Garrett D'Amore almost 10 years ago
No. KM_NOSLEEP allocations won't be in that code path. Or rather, the rc will be -1, indicating a KM_NOSLEEP failed allocation, not a failure of the constructor. (So the assert won't fire if KM_NOSLEEP fails.)
Updated by Alexey Zaytsev almost 10 years ago
Got it. kmem_cache_alloc only expects the constructor to always succeed when KM_NOSLEEP is not set. Let me fix blkdev.
Updated by Alexey Zaytsev almost 10 years ago
- Subject changed from blkdev-related panic on DEBUG kernels. to blkdev-related panic on DEBUG kernels
Updated by Garrett D'Amore over 9 years ago
- Category changed from kernel to driver - device drivers
- Status changed from New to Resolved
- % Done changed from 0 to 100
- Tags deleted (
needs-triage)
Resolved in
commit 679ac1565a070e343ccb5d6dcff1231cc6011ce4
Author: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Date: Tue Oct 16 03:33:44 2012 -0700
2911 blkdev-related panic on DEBUG kernels
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Garrett D'Amore <garrett@damore.org>