Bug #12076
openreproducible panic in zfs:abd_free_linear
0%
Description
I've been trying to reproduce https://smartos.org/bugview/OS-7920 on OmniOS with my build of illumos-gate at approximately a63fed2a0384be5aa3f2ff7a38aac1153c549e87. My changes only affect the dump path, which works flawlessly when I trigger the bug described below.
With a little patience, I can reproduce this panic:
> ::status debugging crash dump vmcore.4 (64-bit) from omni-2 operating system: 5.11 omni-physio-0-g59a9ce0e70 (i86pc) build version: gfx-drm - heads/master-0-gbdc58b1-dirty image uuid: 54f6f5b5-06a8-64ec-f6c8-b3967a1f8418 panic message: assertion failed: rc->rc_count == number, file: ../../common/fs/zfs/refcount.c, line: 90 dump content: kernel pages only > $C ffffff00085e9810 vpanic() ffffff00085e9860 0xfffffffffbe2a425() ffffff00085e98a0 zfs_refcount_destroy_many+0x30(ffffff026f6f3190, 0) ffffff00085e98c0 zfs_refcount_destroy+0x10(ffffff026f6f3190) ffffff00085e98f0 abd_free_linear+0x2d(ffffff026f6f3180) ffffff00085e9930 abd_free+0x1a5(ffffff026f6f3180) ffffff00085e9970 vdev_queue_agg_io_done+0x30(ffffff028c33c1d8) ffffff00085e9a10 zio_done+0x69a(ffffff028c33c1d8) ffffff00085e9a50 zio_execute+0xf5(ffffff028c33c1d8) ffffff00085e9b00 taskq_thread+0x315(ffffff0272866640) ffffff00085e9b10 thread_start+0xb()
I have a pool that looks like this:
errors: No known data errors pool: zz_test state: ONLINE scan: resilvered 9.15M in 0 days 00:00:01 with 0 errors on Fri Dec 13 16:43:28 2019 config: NAME STATE READ WRITE CKSUM zz_test ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c0t600144F048A3AB7200005DF420570001d0 ONLINE 0 0 0 c0t600144F048A3AB7200005DF420570002d0 ONLINE 0 0 0 spares c0t600144F048A3AB7200005DF420570003d0 AVAIL
The disks are iSCSI LUs backed by some virtio disks, with the target on the same machine. This OmniOS instance is a guest running under KVM on Fedora 30. After installing the iscsi/target and iscsi/initiator packages, setup is performed with this script:
#!/bin/bash pool=zz_test disks=($@) if (( ${#disks[@]} != 3 )); then echo "$0: 3 disks required (p0 node probably needed)" 1>&2 exit 1 fi function get_lus { stmfadm list-lu | awk '{print $NF}' } function get_iscsi_disks { stmfadm list-lu | awk '{printf("c0t%sd0\n", $NF)}' } set -xeuo pipefail svcadm enable stmf svcadm enable iscsi/target svcadm enable iscsi/initiator if [[ -z "$(itadm list-target)" ]]; then itadm create-target fi iscsiadm modify discovery -t enable iscsiadm add discovery-address 127.0.0.1 if zpool import "$pool" >/dev/null 2>&1 || zpool list "$pool" >/dev/null 2>&1; then zpool destroy "$pool" fi # Get whole disk partitions so that d0 node can appear zpool create "$pool" $(echo "${disks[@]}" | sed 's/p0//g') zpool destroy "$pool" for lu in $(get_lus); do stmfadm delete-lu $lu done for disk in "${disks[@]}"; do stmfadm create-lu -p serial="$pool.$disk" /dev/dsk/$disk done for lu in $(get_lus); do stmfadm online-lu $lu stmfadm add-view $lu done devfsadm -C -c disk devfsadm idisks=( $(get_iscsi_disks) ) zpool create -o cachefile=none $pool mirror "${idisks[0]}" "${idisks[1]}" spare "${idisks[2]}" zpool status $pool
I then run the following simultaneously:
#! /bin/bash while dd if=/dev/zero of=/zz_test/file bs=1024k count=100; do : done
and
#! /bin/bash lu=$(stmfadm list-lu | awk 'NR == 2 { print $NF; exit }') if [[ -z $lu ]]; then echo "$0: cannot find lu" 1>&2 exit 1 fi echo "Abusing LU $lu" while true; do echo $(date) offline stmfadm offline-lu "$lu" sleep 15 echo $(date) online stmfadm online-lu "$lu" sleep 45 done
I've seen it reproduce after the 6th offline-lu and I've also seen it survive for hours.
This very well may be the same bug as OS-7920, but I've not been able to determine that conclusively. With OS-7920, it's always the innocent thread that has its stack splatted on the console.
Related issues
Updated by Mike Gerdts over 3 years ago
- Related to Bug #4454: ldi notifications trigger vdev_disk_free() without spa_config_lock() added
Updated by Mike Gerdts over 3 years ago
- Category set to zfs - Zettabyte File System