Project

General

Profile

Bug #3642

dsl_scan_active() should not issue I/O to determine if async destroying is active

Added by Christopher Siden over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Category:
zfs - Zettabyte File System
Start date:
2013-03-21
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

From George Wilson's analysis of a ZFS hang we were seeing at Delphix:

The deadlock looks like this:

- the ZIL interrupt I/O threads are blocked in txg_rele_to_sync() trying to get
the tc_lock
- the tc_lock is held by a thread in txg_delay that is trying to get the
tx_sync_lock
- the tx_sync_lock is held by the tag_sync_thread that has called
spa_feature_is_active()
- the call to spa_feature_is_active() is having to issue a read to read the zap
- the read is queued but will never complete because the ZFS I/O pipeline is
waiting for another I/O to complete before issuing the next I/O

There are actually two issues here:

1). txg_delay should not hold the tc_lock 
2). we should cache the async destroying feature and avoid any I/O while
holding the tx_sync_lock

This bug will focus on issue #2


Related issues

Related to illumos gate - Bug #3643: txg_delay should not hold the tc_lockClosed2013-03-21

Actions

History

#1

Updated by Christopher Siden over 6 years ago

  • Status changed from In Progress to Closed
commit 4a92375
Author: George Wilson <george.wilson@delphix.com>
Date:   Tue Apr 23 10:31:42 2013

    3642 dsl_scan_active() should not issue I/O to determine if async destroying is active
    3643 txg_delay should not hold the tc_lock
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Adam Leventhal <ahl@delphix.com>
    Approved by: Gordon Ross <gwr@nexenta.com>

Also available in: Atom PDF