Project

General

Profile

Bug #6226

panic upon zpool scrub -s when disk timeouts are occurring

Added by Rich Murphey about 4 years ago. Updated about 4 years ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
2015-09-12
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

I have observed a panic each time I try to stop a scrub operation on an array that has disks which are experiencing timeout issues.

Multiple disks are timing out when this occurs.
I have the crash dumps if needed.
Please feel free to ask me to produce further data.

I have repeated this issue by:
create the array shown in 'zpool.status.txt'.
- configuration of the pool is shown in 'zpool.get.all.txt'
- configuration of the filesystem is shown in 'zfs.get.all.txt'
store approx. 3TB of data in the filesystem.
invoke "zpool scrub"
invoke "zpool scrub -s"
- the panic shown below occurs.

The disks that are timing out are Western Digital VelociRaptor WD3000BLFS 300GB sata drives.
The zpool is composed of 50 drives in two HP D2700 enclosures attached via a H221 controller.
The zpool is configured as ten raidz1 zvols of five drives,

The machine is a proliant dl380p with 96 GB ram running:
uname -a: SunOS g 5.11 illumos-4cfd297 i86pc i386 i86pc
/etc/release: OpenIndiana Development oi_151.1.8 X86 (powered by illumos)

The tail of the dmesg is shown below.
Thanks,
Rich

panic[cpu0]/thread=ffffff00b80cbc40: |# TCP tuning for NFS
I/O to pool 'a' appears to be hung. |# https://blogs.oracle.com/dlutz/entry/maximizing_nfs_client_performance_on | |ndd -get /dev/tcp tcp_xmit_hiwat
ffffff00b80cba20 zfs:vdev_deadman+10b () |# was 49152
ffffff00b80cba70 zfs:vdev_deadman+4a () |ndd -set /dev/tcp tcp_xmit_hiwat 1048576
ffffff00b80cbac0 zfs:vdev_deadman+4a () |ndd -set /dev/tcp tcp_recv_hiwat 1048576
ffffff00b80cbaf0 zfs:spa_deadman+ad () |
ffffff00b80cbb90 genunix:cyclic_softint+fd () |#________________
ffffff00b80cbba0 unix:cbe_low_level+14 () |pkg install mdb
ffffff00b80cbbf0 unix:av_dispatch_softvect+78 () |dumpadm
ffffff00b80cbc20 apix:apix_dispatch_softint+35 () | Dump content: kernel pages
ffffff00b8005990 unix:switch_sp_and_call+13 () | Dump device: /dev/zvol/dsk/rpool/dump (dedicated)
ffffff00b80059e0 apix:apix_do_softint+6c () |Savecore directory: /var/crash/ichiban
ffffff00b8005a40 apix:apix_do_interrupt+34a () | Savecore enabled: no
ffffff00b8005a50 unix:cmnint+ba () | Save compressed: on
ffffff00b8005bc0 unix:acpi_cpu_cstate+11b () |mkdir -p /var/crash/`hostname`
ffffff00b8005bf0 unix:cpu_acpi_idle+8d () |savecore
ffffff00b8005c00 unix:cpu_idle_adaptive+13 () |cd /var/crash/`hostname`
ffffff00b8005c20 unix:idle+a7 () |mv vmdump.0.hide vmdump.0.z
ffffff00b8005c30 unix:thread_start+8 () |savecore -f vmcore.0.z |mdb 0 <<EOF
syncing file systems... |$<msgbuf
done |$C
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel


Files

zfs.get.all.txt (3.75 KB) zfs.get.all.txt zfs filesystem config Rich Murphey, 2015-09-12 05:23 PM
dmesg.0.txt (13.9 KB) dmesg.0.txt dmesg gathered from dump Rich Murphey, 2015-09-12 05:23 PM
zpool.get.all.txt (2.59 KB) zpool.get.all.txt zpool config Rich Murphey, 2015-09-12 05:23 PM
zpool.status.txt (3.64 KB) zpool.status.txt zpool status Rich Murphey, 2015-09-12 05:23 PM

History

#1

Updated by Rich Murphey about 4 years ago

The dmesg above was garbled.
Here is a clean copy of the tail of the dmesg.

panic[cpu0]/thread=ffffff00b80cbc40:
I/O to pool 'a' appears to be hung.

ffffff00b80cba20 zfs:vdev_deadman+10b ()
ffffff00b80cba70 zfs:vdev_deadman+4a ()
ffffff00b80cbac0 zfs:vdev_deadman+4a ()
ffffff00b80cbaf0 zfs:spa_deadman+ad ()
ffffff00b80cbb90 genunix:cyclic_softint+fd ()
ffffff00b80cbba0 unix:cbe_low_level+14 ()
ffffff00b80cbbf0 unix:av_dispatch_softvect+78 ()
ffffff00b80cbc20 apix:apix_dispatch_softint+35 ()
ffffff00b8005990 unix:switch_sp_and_call+13 ()
ffffff00b80059e0 apix:apix_do_softint+6c ()
ffffff00b8005a40 apix:apix_do_interrupt+34a ()
ffffff00b8005a50 unix:cmnint+ba ()
ffffff00b8005bc0 unix:acpi_cpu_cstate+11b ()
ffffff00b8005bf0 unix:cpu_acpi_idle+8d ()
ffffff00b8005c00 unix:cpu_idle_adaptive+13 ()
ffffff00b8005c20 unix:idle+a7 ()
ffffff00b8005c30 unix:thread_start+8 ()

syncing file systems...
done
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel

Also available in: Atom PDF