Project

General

Profile

Bug #9465

ARC check for 'anon_size > arc_c/2' can stall the system

Added by Brad Lewis over 1 year ago. Updated 12 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2018-04-12
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Seen in a test suite run of checkpoint_big_rewind which uses a nested pool. Does not appear to reproduce easily.

Current theory is that the upper pool’s dirty/anon data is preventing the lower pool from adding writes into its open context.
The upper pool cannot make progress (and clear its dirty data) until its write I/O sent to lower pool completes.

nestedpool (upper)
waiting for I/O to a vdev in testpool
lots of dirty/anon data but syncing I stalled
spa_syncing_txg is 162
vdev io is getting throttled
Stack:

swtch+0x141()
cv_wait+0x70(ffffff03385afb78, ffffff03385afb70)
zio_wait+0xbb(ffffff03385af800)
dsl_pool_sync+0xf9(ffffff033df2cb00, a2)
spa_sync+0x456(ffffff0342c18000, a2)
txg_sync_thread+0x260(ffffff033df2cb00)
thread_start+8()

testpool (lower)
stalled in dmu_tx_assign
waiting for anon_size to shrink

spa_syncing_txg is 11,800,157 (in less than 30 minutes!)
typically a run of this test completes in under 200 TXGs
Stacks (per each file vdev used by upper):
swtch+0x141()
cv_wait+0x70(ffffff035368549e, ffffff0353685458)
txg_wait_open+0xcb(ffffff0353685280, b40e5f)
dmu_tx_wait+0x1d8(ffffff0328bc87c0)
dmu_tx_assign+0x8a(ffffff0328bc87c0, 1)
zfs_write+0x561(ffffff0527861980, ffffff000cdf1a80, 0, ffffff0310e98db0, 0)
fop_write+0x5b(ffffff0527861980, ffffff000cdf1a80, 0, ffffff0310e98db0, 0)
vn_rdwr+0x27a(1, ffffff0527861980, ffffff032dfdc000, 800, ec0000, 1)
vdev_file_io_strategy+0x65(ffffff033fd71380)
taskq_d_thread+0xb7(ffffff0322000568)
thread_start+8()

History

#1

Updated by Electric Monk 12 months ago

  • % Done changed from 0 to 100
  • Status changed from New to Closed

git commit abe1fd01ce5a83718c5a840daeab4abdaec1c104

commit  abe1fd01ce5a83718c5a840daeab4abdaec1c104
Author: Don Brady <don.brady@delphix.com>
Date:   2018-07-23T13:44:58.000Z

    9465 ARC check for 'anon_size > arc_c/2' can stall the system
    Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
    Reviewed by: Matt Ahrens <matt@delphix.com>
    Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
    Approved by: Robert Mustacchi <rm@joyent.com>

Also available in: Atom PDF