Project

General

Profile

Actions

Bug #3950

closed

ztest: deadman fires when we're doing a scan

Added by Christopher Siden over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Category:
zfs - Zettabyte File System
Start date:
2013-08-01
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

Analysis by George Wilson:

After running ztest for about 12 hours I hit a deadman timer fire. Looking at
the stacks it's clear that the pool is still making progress:

> ::zio_state -r
ADDRESS         TYPE  STAGE            WAITER           TIME_ELAPSED
8568bc8         NULL  OPEN             -                -            
 8cb2bc0        READ  VDEV_IO_START    -                -            
  c84f630       READ  VDEV_IO_START    -                -            
   d0d5980      READ  VDEV_IO_START    -                -            
    d09a5f0     READ  VDEV_IO_DONE     -                -            
     d675cc8    READ  VDEV_IO_START    -                -            
    d2bc2f0     READ  VDEV_IO_DONE     -                -            
     bec3c90    READ  VDEV_IO_START    -                -            
    b5f7be0     READ  VDEV_IO_DONE     -                -            
     d25fbf0    READ  VDEV_IO_START    -                -            
   b528970      READ  VDEV_IO_START    -                -            
    d134910     READ  VDEV_IO_DONE     -                -            
     8568008    READ  VDEV_IO_START    -                -            
    d134c00     READ  VDEV_IO_DONE     -                -            
     8723cf0    READ  VDEV_IO_START    -                -            
 cc902f0        READ  VDEV_IO_START    -                -            
  d33c948       READ  VDEV_IO_START    -                -            
   b6d86c0      READ  VDEV_IO_START    -                -            
    c18c3b8     READ  VDEV_IO_DONE     -                -            
     bec3c90    READ  VDEV_IO_START    -                -            
    d2c1300     READ  VDEV_IO_DONE     -                -            
     d25fbf0    READ  VDEV_IO_START    -                -            
   8ec3380      READ  VDEV_IO_START    -                -            
    8966a20     READ  VDEV_IO_DONE     -                -            

> ::stacks -c spa_sync
THREAD   STATE    SOBJ        COUNT
16b      UNPARKED <NONE>          1
         libc.so.1`cond_wait_queue+0x60
         libc.so.1`__cond_wait+0x86
         libc.so.1`cond_wait+0x24
         libzpool.so.1`cv_wait+0x40
         libzpool.so.1`dsl_scan_sync+0x562
         libzpool.so.1`spa_sync+0x4cd
         libzpool.so.1`txg_sync_thread+0x3bd
         libc.so.1`_thrp_setup+0x9b
         libc.so.1`_lwp_start

It appears that the deadman timer fired because the dsl_scan is taking a long
time. The deadman timer should recognize that check to see if we're making
progress before killing off the process.

The deadman timer in ztest does not take into account anything else that might
be happening on the pool. We can make the deadman timer smarter by leveraging
spa_suspended() and the vdev_deadman() logic.

Actions #1

Updated by Christopher Siden over 8 years ago

  • Status changed from In Progress to Closed
commit 2c1e2b4
Author: George Wilson <george.wilson@delphix.com>
Date:   Wed Aug 7 11:24:34 2013

    3949 ztest fault injection should avoid resilvering devices
    3950 ztest: deadman fires when we're doing a scan
    3951 ztest hang when running dedup test
    3952 ztest: ztest_reguid test and ztest_fault_inject don't place nice together
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Adam Leventhal <ahl@delphix.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>
Actions

Also available in: Atom PDF