Project

General

Profile

Bug #12076

reproducible panic in zfs:abd_free_linear

Added by Mike Gerdts 3 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:

Description

I've been trying to reproduce https://smartos.org/bugview/OS-7920 on OmniOS with my build of illumos-gate at approximately a63fed2a0384be5aa3f2ff7a38aac1153c549e87. My changes only affect the dump path, which works flawlessly when I trigger the bug described below.

With a little patience, I can reproduce this panic:

> ::status
debugging crash dump vmcore.4 (64-bit) from omni-2
operating system: 5.11 omni-physio-0-g59a9ce0e70 (i86pc)
build version: gfx-drm - heads/master-0-gbdc58b1-dirty

image uuid: 54f6f5b5-06a8-64ec-f6c8-b3967a1f8418
panic message: assertion failed: rc->rc_count == number, file: ../../common/fs/zfs/refcount.c, line: 90
dump content: kernel pages only
> $C
ffffff00085e9810 vpanic()
ffffff00085e9860 0xfffffffffbe2a425()
ffffff00085e98a0 zfs_refcount_destroy_many+0x30(ffffff026f6f3190, 0)
ffffff00085e98c0 zfs_refcount_destroy+0x10(ffffff026f6f3190)
ffffff00085e98f0 abd_free_linear+0x2d(ffffff026f6f3180)
ffffff00085e9930 abd_free+0x1a5(ffffff026f6f3180)
ffffff00085e9970 vdev_queue_agg_io_done+0x30(ffffff028c33c1d8)
ffffff00085e9a10 zio_done+0x69a(ffffff028c33c1d8)
ffffff00085e9a50 zio_execute+0xf5(ffffff028c33c1d8)
ffffff00085e9b00 taskq_thread+0x315(ffffff0272866640)
ffffff00085e9b10 thread_start+0xb()

I have a pool that looks like this:

errors: No known data errors
  pool: zz_test
 state: ONLINE
  scan: resilvered 9.15M in 0 days 00:00:01 with 0 errors on Fri Dec 13 16:43:28 2019
config:

        NAME                                       STATE     READ WRITE CKSUM
        zz_test                                    ONLINE       0     0     0
          mirror-0                                 ONLINE       0     0     0
            c0t600144F048A3AB7200005DF420570001d0  ONLINE       0     0     0
            c0t600144F048A3AB7200005DF420570002d0  ONLINE       0     0     0
        spares
          c0t600144F048A3AB7200005DF420570003d0    AVAIL

The disks are iSCSI LUs backed by some virtio disks, with the target on the same machine. This OmniOS instance is a guest running under KVM on Fedora 30. After installing the iscsi/target and iscsi/initiator packages, setup is performed with this script:

#!/bin/bash

pool=zz_test
disks=($@)
if (( ${#disks[@]} != 3 )); then
    echo "$0: 3 disks required (p0 node probably needed)" 1>&2
    exit 1
fi

function get_lus {
    stmfadm list-lu | awk '{print $NF}'
}

function get_iscsi_disks {
    stmfadm list-lu | awk '{printf("c0t%sd0\n", $NF)}'
}

set -xeuo pipefail

svcadm enable stmf
svcadm enable iscsi/target
svcadm enable iscsi/initiator
if [[ -z "$(itadm list-target)" ]]; then
    itadm create-target
fi

iscsiadm modify discovery -t enable
iscsiadm add discovery-address 127.0.0.1

if zpool import "$pool" >/dev/null 2>&1 || zpool list "$pool" >/dev/null 2>&1; then
    zpool destroy "$pool" 
fi

# Get whole disk partitions so that d0 node can appear
zpool create "$pool" $(echo "${disks[@]}" | sed 's/p0//g')
zpool destroy "$pool" 

for lu in $(get_lus); do
    stmfadm delete-lu $lu
done

for disk in "${disks[@]}"; do
    stmfadm create-lu -p serial="$pool.$disk" /dev/dsk/$disk
done

for lu in $(get_lus); do
    stmfadm online-lu $lu
    stmfadm add-view $lu
done

devfsadm -C -c disk
devfsadm

idisks=( $(get_iscsi_disks) )
zpool create -o cachefile=none $pool mirror "${idisks[0]}" "${idisks[1]}" spare "${idisks[2]}" 
zpool status $pool

I then run the following simultaneously:

#! /bin/bash

while dd if=/dev/zero of=/zz_test/file bs=1024k count=100; do
    :
done

and

#! /bin/bash

lu=$(stmfadm list-lu | awk 'NR == 2 { print $NF; exit }')
if [[ -z $lu ]]; then
    echo "$0: cannot find lu" 1>&2
    exit 1
fi
echo "Abusing LU $lu" 

while true; do
    echo $(date) offline
    stmfadm offline-lu "$lu" 
    sleep 15
    echo $(date) online
    stmfadm online-lu "$lu" 
    sleep 45
done

I've seen it reproduce after the 6th offline-lu and I've also seen it survive for hours.

This very well may be the same bug as OS-7920, but I've not been able to determine that conclusively. With OS-7920, it's always the innocent thread that has its stack splatted on the console.


Related issues

Related to illumos gate - Bug #4454: ldi notifications trigger vdev_disk_free() without spa_config_lock()Closed2014-01-10

Actions

History

#1

Updated by Mike Gerdts 2 months ago

  • Related to Bug #4454: ldi notifications trigger vdev_disk_free() without spa_config_lock() added
#2

Updated by Mike Gerdts 2 months ago

  • Category set to zfs - Zettabyte File System

Also available in: Atom PDF