Bug #5376
closedarc_kmem_reap_now() should not result in clearing arc_no_grow
100%
Description
I observed a machine writing data at 200-500MB/s.
mpstat showed that periodically one CPU would become 100% busy for a few
seconds, with heavy crosscall activity. Sometimes the other CPUs would become
idle during this time. I discovered that this was due primarily to
arc_kmem_reap_now()'s calls to kmem_cache_reap_now(), and resulting TLB
shootdowns to remove mappings for freed slabs. There are many opportunities to
improve the kmem/vmem/hat code. However, upon reproducing the issue, I found that ZFS's
ARC management can also be improved to reduce the frequency and duration of
these xcall storms. The primary observation is that we should not clear
arc_no_grow after calling arc_kmem_reap_now(). One way to prevent this would
be to have an idea of when we are getting low on memory before we actually have
to call arc_kmem_reap_now(), to not allow arc_no_grow to be cleared when we are
low on memory, and to ensure that arc_shrink() will keep us in the "low on
memory" region in the absence of other interactions. Implementing this fix
resulted in much less frequent calls to arc_kmem_reap_now() under a synthetic
workload.
Updated by Electric Monk over 7 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit 2ec99e3e987d8aa273f1e9ba2b983557d058198c
commit 2ec99e3e987d8aa273f1e9ba2b983557d058198c Author: Matthew Ahrens <mahrens@delphix.com> Date: 2014-12-19T19:57:45.000Z 5376 arc_kmem_reap_now() should not result in clearing arc_no_grow Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Dan McDonald <danmcd@omniti.com>