Actions
Bug #3950
closedztest: deadman fires when we're doing a scan
Start date:
2013-08-01
Due date:
% Done:
100%
Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:
External Bug:
Description
Analysis by George Wilson:
After running ztest for about 12 hours I hit a deadman timer fire. Looking at the stacks it's clear that the pool is still making progress: > ::zio_state -r ADDRESS TYPE STAGE WAITER TIME_ELAPSED 8568bc8 NULL OPEN - - 8cb2bc0 READ VDEV_IO_START - - c84f630 READ VDEV_IO_START - - d0d5980 READ VDEV_IO_START - - d09a5f0 READ VDEV_IO_DONE - - d675cc8 READ VDEV_IO_START - - d2bc2f0 READ VDEV_IO_DONE - - bec3c90 READ VDEV_IO_START - - b5f7be0 READ VDEV_IO_DONE - - d25fbf0 READ VDEV_IO_START - - b528970 READ VDEV_IO_START - - d134910 READ VDEV_IO_DONE - - 8568008 READ VDEV_IO_START - - d134c00 READ VDEV_IO_DONE - - 8723cf0 READ VDEV_IO_START - - cc902f0 READ VDEV_IO_START - - d33c948 READ VDEV_IO_START - - b6d86c0 READ VDEV_IO_START - - c18c3b8 READ VDEV_IO_DONE - - bec3c90 READ VDEV_IO_START - - d2c1300 READ VDEV_IO_DONE - - d25fbf0 READ VDEV_IO_START - - 8ec3380 READ VDEV_IO_START - - 8966a20 READ VDEV_IO_DONE - - > ::stacks -c spa_sync THREAD STATE SOBJ COUNT 16b UNPARKED <NONE> 1 libc.so.1`cond_wait_queue+0x60 libc.so.1`__cond_wait+0x86 libc.so.1`cond_wait+0x24 libzpool.so.1`cv_wait+0x40 libzpool.so.1`dsl_scan_sync+0x562 libzpool.so.1`spa_sync+0x4cd libzpool.so.1`txg_sync_thread+0x3bd libc.so.1`_thrp_setup+0x9b libc.so.1`_lwp_start It appears that the deadman timer fired because the dsl_scan is taking a long time. The deadman timer should recognize that check to see if we're making progress before killing off the process.
The deadman timer in ztest does not take into account anything else that might be happening on the pool. We can make the deadman timer smarter by leveraging spa_suspended() and the vdev_deadman() logic.
Updated by Christopher Siden over 9 years ago
- Status changed from In Progress to Closed
commit 2c1e2b4 Author: George Wilson <george.wilson@delphix.com> Date: Wed Aug 7 11:24:34 2013 3949 ztest fault injection should avoid resilvering devices 3950 ztest: deadman fires when we're doing a scan 3951 ztest hang when running dedup test 3952 ztest: ztest_reguid test and ztest_fault_inject don't place nice together Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net>
Actions