Project

General

Profile

Bug #5911

ZFS "hangs" while deleting file

Added by Matthew Ahrens over 4 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Category:
zfs - Zettabyte File System
Start date:
2015-05-04
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Sometimes ZFS appears to hang while deleting a file. It is actually making slow progress at the file deletion, but other operations (administrative and writes via the data path) "hang" until the file removal completes, which can take a long time if the file has many blocks. The deletion (or most of it) happens in a single txg, and the sync thread spends most of its time reading indirect blocks via this stack trace:

swtch+0x141()
cv_wait+0x70()
zio_wait+0x5b()
dbuf_read+0x2c0()
free_children+0x50()
free_children+0x12a()
free_children+0x12a()
free_children+0x12a()
dnode_sync_free_range_impl+0xdf()
dnode_sync_free_range+0x52()
range_tree_vacate+0x65()
dnode_sync+0x1d8()
dmu_objset_sync_dnodes+0x77()
dmu_objset_sync+0x19f()
dsl_dataset_sync+0x51()
dsl_pool_sync+0x9a()
spa_sync+0x2ff()
txg_sync_thread+0x21f()
thread_start+8()

One way to reproduce the problem is if we are over the arc_meta_limit, e.g. because lots of indirect blocks are pinned because we have L0 dbufs under them. It could be that most of the L1 indirects are cached, in which case when dmu_free_long_range_impl() calls dmu_tx_hold_free(), it will complete very quickly. This allows dmu_free_long_range_impl() to put many (perhaps all of its) transactions in the same TXG. However, dmu_free_long_range_impl() calls dnode_evict_dbufs (and dnode_free_range()), which removes the L0 dbufs, thus reducing the hold count on the L1 indirect blocks above it, allowing them to be evicted. Because we are over the arc_meta_limit(), these L1 blocks will be evicted ASAP. Thus when we get to syncing context, the L1 indirects are no longer cached and must be read in.


Related issues

Related to illumos gate - Bug #6569: large file delete can starve out write opsClosed2016-01-20

Actions

History

#1

Updated by Electric Monk over 4 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 46e1baa6cf6d5432f5fd231bb588df8f9570c858

commit  46e1baa6cf6d5432f5fd231bb588df8f9570c858
Author: Matthew Ahrens <mahrens@delphix.com>
Date:   2015-06-13T03:24:39.000Z

    5911 ZFS "hangs" while deleting file
    Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
    Reviewed by: Alek Pinchuk <alek@nexenta.com>
    Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
    Reviewed by: Dan McDonald <danmcd@omniti.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>

#2

Updated by Alek Pinchuk almost 4 years ago

  • Related to Bug #6569: large file delete can starve out write ops added

Also available in: Atom PDF