Actions
Bug #3642
closeddsl_scan_active() should not issue I/O to determine if async destroying is active
Start date:
2013-03-21
Due date:
% Done:
100%
Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:
Description
From George Wilson's analysis of a ZFS hang we were seeing at Delphix:
The deadlock looks like this: - the ZIL interrupt I/O threads are blocked in txg_rele_to_sync() trying to get the tc_lock - the tc_lock is held by a thread in txg_delay that is trying to get the tx_sync_lock - the tx_sync_lock is held by the tag_sync_thread that has called spa_feature_is_active() - the call to spa_feature_is_active() is having to issue a read to read the zap - the read is queued but will never complete because the ZFS I/O pipeline is waiting for another I/O to complete before issuing the next I/O There are actually two issues here: 1). txg_delay should not hold the tc_lock 2). we should cache the async destroying feature and avoid any I/O while holding the tx_sync_lock This bug will focus on issue #2
Related issues
Updated by Christopher Siden about 9 years ago
- Status changed from In Progress to Closed
commit 4a92375 Author: George Wilson <george.wilson@delphix.com> Date: Tue Apr 23 10:31:42 2013 3642 dsl_scan_active() should not issue I/O to determine if async destroying is active 3643 txg_delay should not hold the tc_lock Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com>
Actions