Project

General

Profile

Bug #6292

exporting a pool while an async destroy is running can leave entries in the deferred tree

Added by Matthew Ahrens almost 4 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Normal
Category:
zfs - Zettabyte File System
Start date:
2015-10-07
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Analysis by George Wilson:

While running some stress tests I found that during 'zpool destroy' the system would panic with an assertion in range_tree_destroy():
panic[cpu6]/thread=ffffff077bc5b840:
assertion failed: rt->rt_space 0 (0x9a00 0x0), file: ../../common/fs/zfs/r
ange_tree.c, line: 150

ffffff00303e7970 genunix:strlog+0 ()
ffffff00303e79a0 zfs:range_tree_destroy+72 ()
ffffff00303e79e0 zfs:metaslab_fini+c1 ()
ffffff00303e7a20 zfs:vdev_metaslab_fini+5e ()
ffffff00303e7a60 zfs:vdev_free+11d ()
ffffff00303e7aa0 zfs:vdev_free+bb ()
ffffff00303e7ae0 zfs:spa_unload+de ()
ffffff00303e7b60 zfs:spa_export_common+115 ()
ffffff00303e7b80 zfs:spa_destroy+1d ()
ffffff00303e7bc0 zfs:zfs_ioc_pool_destroy+20 ()
ffffff00303e7c70 zfs:zfsdev_ioctl+4ff ()
ffffff00303e7cb0 genunix:cdev_ioctl+39 ()
ffffff00303e7d00 specfs:spec_ioctl+60 ()
ffffff00303e7d90 genunix:fop_ioctl+55 ()
ffffff00303e7eb0 genunix:ioctl+9b ()
ffffff00303e7f00 unix:brand_sys_sysenter+2b7 ()
We still have space in the deferred tree which indicates that we have some frees that were processed when the pool is trying to export. From the ::zfs_dbgmsg we see a clue of what is happening:
command: zpool destroy -f perfpool
bptree index 0: traversing from min_txg=1 bookmark -1/10/0/132004
bptree index 0: traversing from min_txg=1 bookmark -1/10/0/132004
bptree index 0: traversing from min_txg=1 bookmark -1/10/0/132004
bptree index 0: traversing from min_txg=1 bookmark -1/10/0/132004
bptree index 0: traversing from min_txg=1 bookmark -1/10/0/132004
bptree index 0: traversing from min_txg=1 bookmark -1/10/0/132004
We should not be processing any scans while we're shutting down the pool:

ffffff32ff4d8000::print spa_t spa_dsl_pool->dp_scan[]

spa_dsl_pool->dp_scan = {
spa_dsl_pool->dp_scan->scn_dp = 0xffffff08133a7a00
spa_dsl_pool->dp_scan->scn_pausing = 0 (0)
spa_dsl_pool->dp_scan->scn_restart_txg = 0
spa_dsl_pool->dp_scan->scn_done_txg = 0
spa_dsl_pool->dp_scan->scn_sync_start_time = 0x1924ae13b1f70
spa_dsl_pool->dp_scan->scn_zio_root = 0xffffff0af0cec790
spa_dsl_pool->dp_scan->scn_is_bptree = 0x1 (B_TRUE)
spa_dsl_pool->dp_scan->scn_async_destroying = 0x1 (B_TRUE)
spa_dsl_pool->dp_scan->scn_async_stalled = 0x1 (B_TRUE)
spa_dsl_pool->dp_scan->scn_visited_this_txg = 0
spa_dsl_pool->dp_scan->scn_phys = {
scn_func = 0
scn_state = 0
scn_queue_obj = 0
scn_min_txg = 0
scn_max_txg = 0
scn_cur_min_txg = 0
scn_cur_max_txg = 0
scn_start_time = 0
scn_end_time = 0
scn_to_examine = 0
scn_examined = 0
scn_to_process = 0
scn_processed = 0
scn_errors = 0
scn_ddt_class_max = 0
scn_ddt_bookmark = {
ddb_class = 0
ddb_type = 0
ddb_checksum = 0
ddb_cursor = 0
}
scn_bookmark = {
zb_objset = 0
zb_object = 0
zb_level = 0
zb_blkid = 0
}
scn_flags = 0
}
}
We should have returned from dsl_scan_sync() because dsl_scan_active() will check to see if the spa is shutting down:
/* * If the scan is inactive due to a stalled async destroy, try again.
*/
if ((!scn->scn_async_stalled && !dsl_scan_active(scn)) ||
spa_sync_pass(dp->dp_spa) > 1)
return;
Unfortunately, scn_async_stalled is set to B_TRUE so we wouldn't have called dsl_scan_active(). But how can this happen?
This can happen if we were in the middle of a txg when spa_export_common() called spa_async_suspend().
As a result, the syncing thread would have paused the scan here:
static boolean_t
dsl_scan_free_should_pause(dsl_scan_t *scn) {
uint64_t elapsed_nanosecs;

if (zfs_recover)
return (B_FALSE);
if (scn->scn_visited_this_txg >= zfs_free_max_blocks)
return (B_TRUE);
elapsed_nanosecs = gethrtime() - scn->scn_sync_start_time;
return (elapsed_nanosecs / NANOSEC > zfs_txg_timeout ||
(NSEC2MSEC(elapsed_nanosecs) > zfs_free_min_time_ms &&
txg_sync_waiting(scn->scn_dp)) ||
spa_shutting_down(scn->scn_dp->dp_spa));
}
If this happens before we free any blocks, then scn_visited_this_txg would still be 0 and we would set scn_async_stalled to B_TRUE:
} else {
/* * If we didn't make progress, mark the async * destroy as stalled, so that we will not initiate * a spa_sync() on its behalf. Note that we only * check this if we are not finished, because if the * bptree had no blocks for us to visit, we can * finish without "making progress".
*/
scn->scn_async_stalled =
(scn->scn_visited_this_txg == 0);
}
Since we called bptree_iterate() we would have dirtied the bonus buffer resulting in frees. We should check for spa_shutting_down() explicitly in the dsl_scan_sync().

Related issues

Related to illumos gate - Bug #5792: Kernel panic while exporting ZFSNew2015-04-03

Actions

History

#1

Updated by Electric Monk almost 4 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit a443cc80c742af740aa82130db840f02b4389365

commit  a443cc80c742af740aa82130db840f02b4389365
Author: George Wilson <george.wilson@delphix.com>
Date:   2015-10-23T15:44:41.000Z

    6292 exporting a pool while an async destroy is running can leave entries in the deferred tree
    Reviewed by: Paul Dagnelie <pcd@delphix.com>
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Andriy Gapon <avg@FreeBSD.org>
    Reviewed by: Fabian Keil <fk@fabiankeil.de>
    Approved by: Gordon Ross <gordon.ross@nexenta.com>

#2

Updated by Alek Pinchuk almost 4 years ago

  • Related to Bug #5792: Kernel panic while exporting ZFS added

Also available in: Atom PDF