Project

General

Profile

Bug #1974

pages_pp_maximum/lotsfree discrepancy leads to mlock() failure

Added by Bryan Cantrill over 7 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Category:
kernel
Start date:
2012-01-10
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

The number of pages that cannot be locked down (that is, the amount of
availrmem that the kernel keeps for itself) is a tracked in a variable that
is called pages_pp_maximum. (Odd nomenclature, even by VM tunable
standards!) This variable defaults to 4% of DRAM. However, the ARC --
itself a major consumer of availrmem -- keys its need to reclaim not based
on availrmem relative to pages_pp_maximum, but rather based on freemem
relative to lotsfree. The problem is that lotsfree is set to 1/64th of DRAM
(~1.56%) -- a lower value. This means that if availrmem is below
pages_pp_maximum but freemem is above lotsfree, mlock() will fail with
EAGAIN -- but nothing in the system will actually force the ARC to reclaim.

This was seen on SmartOS on a 256GB machine which had 156GB of ARC, but
could not start a measly 4GB KVM guest because availrmem was at roughly 10GB
but freemem was above the 4GB at which arc_reclaim_needed() would have
indicated an ARC reclaim. This was worked around by forcing freemem to drop
below lotsfree with this simple Perl script (due to Brendan Gregg):

  /usr/perl5/bin/perl -e '$a = "A" x 1_000_000_000; sleep 600' &

That will consume 2GB at a go; running three of these forced freemem below
lotsfree, inducing ARC reclaim and allowing the KVM guest to lock down its
4GB and start.

In terms of working around this, one can generate traditional DRAM pressure
(e.g., via the workaround above) or pages_pp_maximum could be safely (and
dynamically, via mdb -kw) set to a much lower value -- like 1% of DRAM.

In terms of fixing this, in addition to its other conditions,
arc_reclaim_needed() should return non-zero if availrmem is below
pages_pp_maximum.

History

#1

Updated by Robert Mustacchi about 6 years ago

  • Tags deleted (needs-triage)
  • % Done changed from 0 to 100
  • Status changed from New to Resolved

Resolved in cf746768a898264fa491791a8252865e5338dc67.

Also available in: Atom PDF