Project

General

Profile

Actions

Bug #5924

open

System deadlocked in htable_steal_active

Added by Alexander Kolbasov about 7 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
kernel
Start date:
2015-05-13
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

A VM with 2 GB of memory became unresponsive and eventually panicked due to ZFS deadman timer firing. While extracting the core dump it panicked again. Both core dumps show a similar patterns. The system is low on memory:

lgrpinfo -m
lgroup 0 (root):
        Memory: installed 2.0G, allocated 2.0G, free 48M

I think that 2 CPUs are trying to get a page, can't do it (may be this is valid and may be this is a bug in HAT layer and system hangs and eventually deadman wakes up and kills us.

First dump:

> ::cpuinfo
 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  0 fffffffffbc4a900  1b   31    0 161   no    no t-2    ffffff0003ec5c40 sched
  1 ffffff014e986580  1f   31    0  99   no    no t-0    ffffff00052cdc40 zpool-cordpool

> ffffff0003ec5c40::findstack -v
stack pointer for thread ffffff0003ec5c40: ffffff0003ec5950
  ffffff0003ec59d0 0xfffffffffbd8bb58()
  ffffff0003ec5a20 vdev_deadman+0x10b(ffffff014ca38a80)
  ffffff0003ec5a70 vdev_deadman+0x4a(ffffff014ca436c0)
  ffffff0003ec5aa0 spa_deadman+0xad(ffffff0170ac8000)
  ffffff0003ec5b90 cyclic_softint+0x209(fffffffffbc40500, 0)
  ffffff0003ec5ba0 cbe_low_level+0x14()
  ffffff0003ec5bf0 av_dispatch_softvect+0x88(2)
  ffffff0003ec5c20 dispatch_softint+0x39(0, 0)
  ffffff000474d040 switch_sp_and_call+0x13()
  ffffff000474d080 dosoftint+0x44(ffffff000474d0f0)
  ffffff000474d0e0 do_interrupt+0x10d(ffffff000474d0f0, fffffffffbc1ea88)
  ffffff000474d0f0 _interrupt+0x1e9()
  ffffff000474d210 tsc_read+3()
  ffffff000474d280 mutex_vector_enter+0xc5(fffffffffbc91998)
  ffffff000474d320 htable_steal_active+0x81(ffffff016fc391a0, 1, 180, ffffff000474d34c, ffffff000474d340)
  ffffff000474d390 htable_steal+0x218(1, 0)
  ffffff000474d420 htable_alloc+0xf9(ffffff014b4c28a0, 8000000, 0, 0)
  ffffff000474d4e0 htable_create+0x20a(ffffff014b4c28a0, 8046000, 0, 0)
  ffffff000474d570 hati_load_common+0x268(ffffff014b4c28a0, 8046000, ffffff0001bdbee0, 40b, 0, 0, 39dc6)
  ffffff000474d600 hat_memload+0x132(ffffff014b4c28a0, 8046000, ffffff0001bdbee0, b, 0)
  ffffff000474d640 hat_memload_region+0x25(ffffff014b4c28a0, 8046000, ffffff0001bdbee0, b, 0, ffffffffffffffff)
  ffffff000474d770 segvn_faultpage+0x5ff(ffffff014b4c28a0, ffffff019df50528, 8046000, fffffffffffff000, ffffff01a1cdc7b8, ffffff000474d850, ffffff010000000f
  , 0, 2, 1)
  ffffff000474d950 segvn_fault+0xcc6(ffffff014b4c28a0, ffffff019df50528, 8046000, 1000, 0, 2)
  ffffff000474da80 as_fault+0x3a2(ffffff014b4c28a0, ffffff014e905800, 8046f50, 1, 0, 2)
  ffffff000474db10 pagefault+0x96(8046f50, 0, 2, 0)
  ffffff000474dc20 trap+0x1331(ffffff000474dc30, 8046f50, 0)
  ffffff000474dc30 0xfffffffffb8002ba()
  ffffff000474dd40 bcopy_altentry+0x1a9()
  ffffff000474de00 poll_common+0x58f(8046f50, 2, ffffff000474de20, 0)
  ffffff000474dea0 pollsys+0xe7(8046f50, 2, 8047028, 0)
  ffffff000474df00 _sys_sysenter_post_swapgs+0x241()
> ffffff00052cdc40::findstack -v
stack pointer for thread ffffff00052cdc40: ffffff00052cc600
  ffffff00052cc620 apic_send_EOI+0x1d(0)
  ffffff00052cc680 hilevel_intr_prolog+0x3d(ffffff014e986580, f, 0, ffffff00052cc6f0)
  ffffff00052cc6e0 do_interrupt+0x1b0(ffffff00052cc6f0, ffffff014e996508)
  ffffff00052cc6f0 _interrupt+0x1e9()
  ffffff00052cc880 mutex_enter+0x10()
  ffffff00052cc8f0 htable_steal+0x218(1, 0)
  ffffff00052cc980 htable_alloc+0xf9(ffffff014b4c2ee0, ffffff01e9c00000, 0, 0)
  ffffff00052cca40 htable_create+0x20a(ffffff014b4c2ee0, ffffff01e9c17000, 0, 0)
  ffffff00052ccad0 hati_load_common+0x268(ffffff014b4c2ee0, ffffff01e9c17000, ffffff0001b3e488, 427, 1, 0, 388c1)
  ffffff00052ccb60 hat_memload+0x132(ffffff014b4c2ee0, ffffff01e9c17000, ffffff0001b3e488, 27, 1)
  ffffff00052ccc00 segkmem_xalloc+0x10f(ffffff0146607000, 0, 1000, 1, 0, fffffffffb89e050, fffffffffc0039e0)
  ffffff00052ccc70 segkmem_alloc_vn+0x5e(ffffff0146607000, 1000, 1, fffffffffc0039e0)
  ffffff00052ccca0 segkmem_alloc+0x20(ffffff0146607000, 1000, 1)
  ffffff00052ccdf0 vmem_xalloc+0x629(ffffff0146608000, 1000, 1000, 0, 0, 0, 0, ffffff0100000001)
  ffffff00052cce70 vmem_alloc+0x145(ffffff0146608000, 1000, 1)
  ffffff00052ccf00 kmem_slab_create+0xaf(ffffff014d6782c8, 1)
  ffffff00052ccf80 kmem_slab_alloc+0x18d(ffffff014d6782c8, 1)
  ffffff00052ccfe0 kmem_cache_alloc+0x283(ffffff014d6782c8, 1)
  ffffff00052cd060 rootnex_coredma_allochdl+0x61(ffffff014b9ddb88, ffffff014b9d8dd0, ffffff00052cd170, 0, 0, ffffff01721b9f00)
  ffffff00052cd0e0 rootnex_dma_allochdl+0xdc(ffffff014b9ddb88, ffffff014b9d8dd0, ffffff00052cd170, 0, 0, ffffff01721b9f00)
  ffffff00052cd160 ddi_dma_allochdl+0x6e(ffffff014b9d8dd0, ffffff014b9d8dd0, ffffff00052cd170, 0, 0, ffffff01721b9f00)
  ffffff00052cd200 ddi_dma_alloc_handle+0x7f(ffffff014b9d8dd0, ffffff00052cd320, 0, 0, ffffff01721b9f00)
  ffffff00052cd2d0 sata_dma_buf_setup+0x426(ffffff01721b9ee8, 40000, 0, 0, ffffff00052cd320)
  ffffff00052cd3d0 sata_scsi_init_pkt+0x15b(ffffff014cb6cbc8, 0, ffffff01e44fcc80, a, 20, 0, ffffff0100040000, 0, 0)
  ffffff00052cd480 scsi_init_pkt+0xaf(ffffff014cb6cbc8, 0, ffffff01e44fcc80, a, 20, 0, ffffff0100040000, fffffffff7e92ca0, ffffff016f80d000)
  ffffff00052cd520 sd_setup_rw_pkt+0xcc(ffffff016f80d000, ffffff00052cd558, ffffff01e44fcc80, 40000, fffffffff7e92ca0, ffffff016f80d000, e9feeb,
  ffffff0100000060)
  ffffff00052cd5a0 sd_initpkt_for_buf+0x10e(ffffff01e44fcc80, ffffff00052cd5d8)
  ffffff00052cd620 sd_start_cmds+0x139(ffffff016f80d000, 0)
  ffffff00052cd670 sd_core_iostart+0xf7(4, ffffff016f80d000, ffffff01e44fcc80)
  ffffff00052cd700 sd_mapblockaddr_iostart+0x383(3, ffffff016f80d000, ffffff01e44fcc80)
  ffffff00052cd750 sd_xbuf_strategy+0x93(ffffff01e44fcc80, ffffff01e44fa640, ffffff016f80d000)
  ffffff00052cd7b0 xbuf_iostart+0x1a7(ffffff016f661840)
  ffffff00052cd7f0 ddi_xbuf_qstrategy+0xb1(ffffff01e44fcc80, ffffff016f661840)
  ffffff00052cd830 sdstrategy+0xfa(ffffff01e44fcc80)
  ffffff00052cd860 bdev_strategy+0x61(ffffff01e44fcc80)
  ffffff00052cd890 ldi_strategy+0x53(ffffff0170a4fdc0, ffffff01e44fcc80)
  ffffff00052cd8d0 vdev_disk_io_start+0xfa(ffffff01daf0e3f8)
  ffffff00052cd930 zio_vdev_io_start+0xe9(ffffff01daf0e3f8)
  ffffff00052cd970 zio_execute+0xd5(ffffff01daf0e3f8)
  ffffff00052cd9a0 zio_nowait+0x43(ffffff01daf0e3f8)
  ffffff00052cd9e0 vdev_queue_io_done+0xad(ffffff017034d058)
  ffffff00052cda20 zio_vdev_io_done+0xa8(ffffff017034d058)
  ffffff00052cda60 zio_execute+0xd5(ffffff017034d058)
  ffffff00052cdb20 taskq_thread+0x318(ffffff016f7ac288)
  ffffff00052cdb30 thread_start+8()

Second core:

> ::cpuinfo
 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  0 fffffffffbc40500  1f   34    0  99   no    no t-1    ffffff00050e2c40 zpool-cordpool
  1 fffffffffbc4a900  1b   34    0 161   no    no t-1    ffffff0004036c40 sched
> ffffff00050e2c40::findstack -v
stack pointer for thread ffffff00050e2c40: ffffff00050e14c0
  ffffff00050e1560 cpus()
  ffffff00050e1690 fakesoftint_return()
  ffffff00050e16f0 do_interrupt+0x1b0(ffffff00050e1700, fffffffffbc16658)
  ffffff00050e1700 _interrupt+0x1e9()
  ffffff00050e1880 htable_steal_active+0x6b(ffffff014fc8a650, 1, 200, ffffff00050e18ac, ffffff00050e18a0)
  ffffff00050e18f0 htable_steal+0x218(1, 0)
  ffffff00050e1980 htable_alloc+0xf9(ffffff014b4c2ee0, ffffff06ffe00000, 0, 0)
  ffffff00050e1a40 htable_create+0x20a(ffffff014b4c2ee0, ffffff06ffe04000, 0, 0)
  ffffff00050e1ad0 hati_load_common+0x268(ffffff014b4c2ee0, ffffff06ffe04000, ffffff00006a53a0, 427, 1, 0, c0ae)
  ffffff00050e1b60 hat_memload+0x132(ffffff014b4c2ee0, ffffff06ffe04000, ffffff00006a53a0, 27, 1)
  ffffff00050e1c00 segkmem_xalloc+0x10f(ffffff0146607000, 0, 1000, 1, 0, fffffffffb89e050, fffffffffc0039e0)
  ffffff00050e1c70 segkmem_alloc_vn+0x5e(ffffff0146607000, 1000, 1, fffffffffc0039e0)
  ffffff00050e1ca0 segkmem_alloc+0x20(ffffff0146607000, 1000, 1)
  ffffff00050e1df0 vmem_xalloc+0x629(ffffff0146608000, 1000, 1000, 0, 0, 0, 0, ffffff0100000001)
  ffffff00050e1e70 vmem_alloc+0x145(ffffff0146608000, 1000, 1)
  ffffff00050e1f00 kmem_slab_create+0xaf(ffffff014d6782c8, 1)
  ffffff00050e1f80 kmem_slab_alloc+0x18d(ffffff014d6782c8, 1)
  ffffff00050e1fe0 kmem_cache_alloc+0x283(ffffff014d6782c8, 1)
  ffffff00050e2060 rootnex_coredma_allochdl+0x61(ffffff014b9ddb88, ffffff014b9d8dd0, ffffff00050e2170, 0, 0, ffffff014cc967c0)
  ffffff00050e20e0 rootnex_dma_allochdl+0xdc(ffffff014b9ddb88, ffffff014b9d8dd0, ffffff00050e2170, 0, 0, ffffff014cc967c0)
  ffffff00050e2160 ddi_dma_allochdl+0x6e(ffffff014b9d8dd0, ffffff014b9d8dd0, ffffff00050e2170, 0, 0, ffffff014cc967c0)
  ffffff00050e2200 ddi_dma_alloc_handle+0x7f(ffffff014b9d8dd0, ffffff00050e2320, 0, 0, ffffff014cc967c0)
  ffffff00050e22d0 sata_dma_buf_setup+0x426(ffffff014cc967a8, 40000, 0, 0, ffffff00050e2320)
  ffffff00050e23d0 sata_scsi_init_pkt+0x15b(ffffff014cb6c970, 0, ffffff03e983cb80, a, 20, 0, ffffff0000040000, 0, 0)
  ffffff00050e2480 scsi_init_pkt+0xaf(ffffff014cb6c970, 0, ffffff03e983cb80, a, 20, 0, ffffff0000040000, fffffffff7e80ca0, ffffff015c045000)
  ffffff00050e2520 sd_setup_rw_pkt+0xcc(ffffff015c045000, ffffff00050e2558, ffffff03e983cb80, 40000, fffffffff7e80ca0, ffffff015c045000, f92677, d6)
  ffffff00050e25a0 sd_initpkt_for_buf+0x10e(ffffff03e983cb80, ffffff00050e25d8)
  ffffff00050e2620 sd_start_cmds+0x139(ffffff015c045000, 0)
  ffffff00050e2670 sd_core_iostart+0xf7(4, ffffff015c045000, ffffff03e983cb80)
  ffffff00050e2700 sd_mapblockaddr_iostart+0x383(3, ffffff015c045000, ffffff03e983cb80)
  ffffff00050e2750 sd_xbuf_strategy+0x93(ffffff03e983cb80, ffffff019f7b8900, ffffff015c045000)
  ffffff00050e27b0 xbuf_iostart+0x1a7(ffffff014d550300)
  ffffff00050e27f0 ddi_xbuf_qstrategy+0xb1(ffffff03e983cb80, ffffff014d550300)
  ffffff00050e2830 sdstrategy+0xfa(ffffff03e983cb80)
  ffffff00050e2860 bdev_strategy+0x61(ffffff03e983cb80)
  ffffff00050e2890 ldi_strategy+0x53(ffffff016fa90d20, ffffff03e983cb80)
  ffffff00050e28d0 vdev_disk_io_start+0xfa(ffffff0404ae0058)
  ffffff00050e2930 zio_vdev_io_start+0xe9(ffffff0404ae0058)
  ffffff00050e2970 zio_execute+0xd5(ffffff0404ae0058)
  ffffff00050e29a0 zio_nowait+0x43(ffffff0404ae0058)
  ffffff00050e29e0 vdev_queue_io_done+0xad(ffffff0404c30c30)
  ffffff00050e2a20 zio_vdev_io_done+0xa8(ffffff0404c30c30)
  ffffff00050e2a60 zio_execute+0xd5(ffffff0404c30c30)
  ffffff00050e2b20 taskq_thread+0x318(ffffff016fbcf9b0)
  ffffff00050e2b30 thread_start+8()
stack pointer for thread ffffff0004036c40: ffffff0004036950
  ffffff00040369d0 0xfffffffffbd8bb58()
  ffffff0004036a20 vdev_deadman+0x10b(ffffff014ca38a80)
  ffffff0004036a70 vdev_deadman+0x4a(ffffff014ca43040)
  ffffff0004036aa0 spa_deadman+0xad(ffffff0170844000)
  ffffff0004036b90 cyclic_softint+0x209(ffffff014e986580, 0)
  ffffff0004036ba0 cbe_low_level+0x14()
  ffffff0004036bf0 av_dispatch_softvect+0x88(2)
  ffffff0004036c20 dispatch_softint+0x39(0, 0)
  ffffff0005063db0 switch_sp_and_call+0x13()
  ffffff0005063df0 dosoftint+0x44(ffffff0005063e60)
  ffffff0005063e50 do_interrupt+0x10d(ffffff0005063e60, ffffff014e997c88)
  ffffff0005063e60 _interrupt+0x1e9()
  ffffff0005063f80 tsc_read+3()
  ffffff0005063ff0 mutex_vector_enter+0xc5(fffffffffbc91830)
  ffffff0005064090 htable_steal_active+0x81(ffffff019ef51658, 1, c0, ffffff00050640bc, ffffff00050640b0)
  ffffff0005064100 htable_steal+0x218(1, 0)
  ffffff0005064190 htable_alloc+0xf9(ffffff014b4c2ee0, ffffff06ffe00000, 0, 0)
  ffffff0005064250 htable_create+0x20a(ffffff014b4c2ee0, ffffff06ffe00000, 0, 0)
  ffffff00050642e0 hati_load_common+0x268(ffffff014b4c2ee0, ffffff06ffe00000, ffffff0002e62dc0, 427, 1, 0, 6162a)
  ffffff0005064370 hat_memload+0x132(ffffff014b4c2ee0, ffffff06ffe00000, ffffff0002e62dc0, 27, 1)
  ffffff0005064410 segkmem_xalloc+0x10f(ffffff0146607000, 0, 4000, 1, 0, fffffffffb89e050, fffffffffc0039e0)
  ffffff0005064480 segkmem_alloc_vn+0x5e(ffffff0146607000, 4000, 1, fffffffffc0039e0)
  ffffff00050644b0 segkmem_alloc+0x20(ffffff0146607000, 4000, 1)
  ffffff0005064600 vmem_xalloc+0x629(ffffff0146608000, 4000, 1000, 0, 0, 0, 0, 1)
  ffffff0005064680 vmem_alloc+0x145(ffffff0146608000, 4000, 1)
  ffffff0005064710 kmem_slab_create+0xaf(ffffff01466302c8, 1)
  ffffff0005064790 kmem_slab_alloc+0x18d(ffffff01466302c8, 1)
  ffffff00050647f0 kmem_cache_alloc+0x283(ffffff01466302c8, 1)
  ffffff0005064830 kmem_alloc+0x4b(4000, 1)
  ffffff0005064870 kmem_zalloc+0xed(4000, 1)
  ffffff00050648e0 real_LZ4_compress+0x47(ffffff0085970000, ffffff03fde70004, 1ca00, 190bc)
  ffffff0005064950 lz4_compress+0x49(ffffff0085970000, ffffff03fde70000, 1ca00, 190c0, 0)
  ffffff00050649b0 zio_compress_data+0xc2(f, ffffff0085970000, ffffff03fde70000, 1ca00)
  ffffff0005064a20 zio_write_bp_init+0x28c(ffffff06fec9f768)
  ffffff0005064a60 zio_execute+0xd5(ffffff06fec9f768)
  ffffff0005064b20 taskq_thread+0x318(ffffff016d5af288)
  ffffff0005064b30 thread_start+8()

Related issues

Related to illumos gate - Bug #4110: panic while reaping htable_cacheClosedBoris Protopopov2013-09-04

Actions
Related to illumos gate - Bug #5923: Panic on ht->ht_lock_cnt == 0 assertion in htable_releaseNew2015-05-13

Actions
Actions #1

Updated by Alexander Kolbasov about 7 years ago

  • Related to Bug #4110: panic while reaping htable_cache added
Actions #2

Updated by Alexander Kolbasov about 7 years ago

  • Related to Bug #5923: Panic on ht->ht_lock_cnt == 0 assertion in htable_release added
Actions

Also available in: Atom PDF