Project

General

Profile

Actions

Bug #13097

closed

improve VM tunables for modern systems

Added by Joshua M. Clulow 11 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Hard
Tags:
Gerrit CR:

Description

Modern systems tend to be configured to swap to a ZFS zvol device. ZFS pool I/O requires a lot more (mostly transient) allocations when compared to more traditional swap targets like UFS-backed files or even raw disk devices. In addition to this, modern systems are quite fast at allocating and consuming memory. It seems that the only hard stop on the provision of pages from the free and cache lists to users is throttlefree, and the only hard stop on allocation to the reset of the kernel outside of the pageout() machinery is pageout_reserve.

It is remarkably easy for a current illumos system to end up deadlocked trying to page out, with even a single Python thread allocating as fast as possible. In a 1GB system, pageout_reserve is a little under 2MB, and throttlefree is a little under 4MB. ZFS requires occasional larger bursts of allocation (e.g., when loading metaslabs, adjusting range trees, compressing blocks, assembling I/O buffers for the trip to disk, etc). 2MB is not enough slack to ensure that pageout() and ZFS can work together to steadily evict memory to disk in these low memory situations.

Even on a larger 8GB system, where pageout_reserve is 16MB today (and throttlefree is 32MB) we can relatively easily hit the deadlock condition, especially if when we get unlucky and things like kmem reaping and ARC reduction don't happen in the correct sequence.

Part of the issue is that while lotsfree seems large enough, the actual harder stops are only at the lower thresholds -- which are halved and halved and halved again from there. Instead of making each subsequent threshold a half of the next threshold up, it seems we could have a shallower ramp; e.g., some of the thresholds could be 3/4 of the size of the threshold above. Indeed, we could also mandate a larger minimum lotsfree of something like 16MB so that even 1GB systems will have around 4MB of reserve pool for pageout.

Finally, for larger systems we probably set too much memory aside! Once pageout_reserve is high enough to make forward progress, the requirement for memory would seem to be more tightly related to the rate at which pageout can or must occur -- rather than the overall size of memory. On a 512GB system, fully 2GB is below throttlefree -- which seems like a lot! We should consider capping lotsfree, or at least greatly detuning its growth, above a certain physical memory size.

Proposed Change

  • lotsfree remains 1/64 of physical memory, but is clamped in the range 16MB through 2048MB
  • minfree (and thus also throttlefree) changes to be 3/4 of desfree
  • pageout_reserve changes to be 3/4 of throttlefree

A comparison of original sizes and proposed new sizes for these values with a range of system memory sizes appears below:

======================================================================
physmem          |  131072 p   512 MB
                 | NEW VALUES                  | ORIGINAL VALUES
lotsfree         |    4096 p    16 MB ( 3.12%) |    2048 p     8 MB ( 1.56%)
desfree          |    2048 p     8 MB ( 1.56%) |    1024 p     4 MB ( 0.78%)
minfree          |    1536 p     6 MB ( 1.17%) |     512 p     2 MB ( 0.39%)
throttlefree     |    1536 p     6 MB ( 1.17%) |     512 p     2 MB ( 0.39%)
pageout_reserve  |    1152 p     4 MB ( 0.88%) |     256 p     1 MB ( 0.20%)

======================================================================
physmem          |  262144 p  1024 MB
                 | NEW VALUES                  | ORIGINAL VALUES
lotsfree         |    4096 p    16 MB ( 1.56%) |    4096 p    16 MB ( 1.56%)
desfree          |    2048 p     8 MB ( 0.78%) |    2048 p     8 MB ( 0.78%)
minfree          |    1536 p     6 MB ( 0.59%) |    1024 p     4 MB ( 0.39%)
throttlefree     |    1536 p     6 MB ( 0.59%) |    1024 p     4 MB ( 0.39%)
pageout_reserve  |    1152 p     4 MB ( 0.44%) |     512 p     2 MB ( 0.20%)

======================================================================
physmem          |  524288 p  2048 MB
                 | NEW VALUES                  | ORIGINAL VALUES
lotsfree         |    8192 p    32 MB ( 1.56%) |    8192 p    32 MB ( 1.56%)
desfree          |    4096 p    16 MB ( 0.78%) |    4096 p    16 MB ( 0.78%)
minfree          |    3072 p    12 MB ( 0.59%) |    2048 p     8 MB ( 0.39%)
throttlefree     |    3072 p    12 MB ( 0.59%) |    2048 p     8 MB ( 0.39%)
pageout_reserve  |    2304 p     9 MB ( 0.44%) |    1024 p     4 MB ( 0.20%)

======================================================================
physmem          | 1048576 p  4096 MB
                 | NEW VALUES                  | ORIGINAL VALUES
lotsfree         |   16384 p    64 MB ( 1.56%) |   16384 p    64 MB ( 1.56%)
desfree          |    8192 p    32 MB ( 0.78%) |    8192 p    32 MB ( 0.78%)
minfree          |    6144 p    24 MB ( 0.59%) |    4096 p    16 MB ( 0.39%)
throttlefree     |    6144 p    24 MB ( 0.59%) |    4096 p    16 MB ( 0.39%)
pageout_reserve  |    4608 p    18 MB ( 0.44%) |    2048 p     8 MB ( 0.20%)

======================================================================
physmem          | 2097152 p  8192 MB
                 | NEW VALUES                  | ORIGINAL VALUES
lotsfree         |   32768 p   128 MB ( 1.56%) |   32768 p   128 MB ( 1.56%)
desfree          |   16384 p    64 MB ( 0.78%) |   16384 p    64 MB ( 0.78%)
minfree          |   12288 p    48 MB ( 0.59%) |    8192 p    32 MB ( 0.39%)
throttlefree     |   12288 p    48 MB ( 0.59%) |    8192 p    32 MB ( 0.39%)
pageout_reserve  |    9216 p    36 MB ( 0.44%) |    4096 p    16 MB ( 0.20%)

======================================================================
physmem          | 4194304 p 16384 MB
                 | NEW VALUES                  | ORIGINAL VALUES
lotsfree         |   65536 p   256 MB ( 1.56%) |   65536 p   256 MB ( 1.56%)
desfree          |   32768 p   128 MB ( 0.78%) |   32768 p   128 MB ( 0.78%)
minfree          |   24576 p    96 MB ( 0.59%) |   16384 p    64 MB ( 0.39%)
throttlefree     |   24576 p    96 MB ( 0.59%) |   16384 p    64 MB ( 0.39%)
pageout_reserve  |   18432 p    72 MB ( 0.44%) |    8192 p    32 MB ( 0.20%)

======================================================================
physmem          | 8388608 p 32768 MB
                 | NEW VALUES                  | ORIGINAL VALUES
lotsfree         |  131072 p   512 MB ( 1.56%) |  131072 p   512 MB ( 1.56%)
desfree          |   65536 p   256 MB ( 0.78%) |   65536 p   256 MB ( 0.78%)
minfree          |   49152 p   192 MB ( 0.59%) |   32768 p   128 MB ( 0.39%)
throttlefree     |   49152 p   192 MB ( 0.59%) |   32768 p   128 MB ( 0.39%)
pageout_reserve  |   36864 p   144 MB ( 0.44%) |   16384 p    64 MB ( 0.20%)

======================================================================
physmem          | 16777216 p 65536 MB
                 | NEW VALUES                  | ORIGINAL VALUES
lotsfree         |  262144 p  1024 MB ( 1.56%) |  262144 p  1024 MB ( 1.56%)
desfree          |  131072 p   512 MB ( 0.78%) |  131072 p   512 MB ( 0.78%)
minfree          |   98304 p   384 MB ( 0.59%) |   65536 p   256 MB ( 0.39%)
throttlefree     |   98304 p   384 MB ( 0.59%) |   65536 p   256 MB ( 0.39%)
pageout_reserve  |   73728 p   288 MB ( 0.44%) |   32768 p   128 MB ( 0.20%)

======================================================================
physmem          | 33554432 p 131072 MB
                 | NEW VALUES                  | ORIGINAL VALUES
lotsfree         |  524288 p  2048 MB ( 1.56%) |  524288 p  2048 MB ( 1.56%)
desfree          |  262144 p  1024 MB ( 0.78%) |  262144 p  1024 MB ( 0.78%)
minfree          |  196608 p   768 MB ( 0.59%) |  131072 p   512 MB ( 0.39%)
throttlefree     |  196608 p   768 MB ( 0.59%) |  131072 p   512 MB ( 0.39%)
pageout_reserve  |  147456 p   576 MB ( 0.44%) |   65536 p   256 MB ( 0.20%)

======================================================================
physmem          | 67108864 p 262144 MB
                 | NEW VALUES                  | ORIGINAL VALUES
lotsfree         |  524288 p  2048 MB ( 0.78%) | 1048576 p  4096 MB ( 1.56%)
desfree          |  262144 p  1024 MB ( 0.39%) |  524288 p  2048 MB ( 0.78%)
minfree          |  196608 p   768 MB ( 0.29%) |  262144 p  1024 MB ( 0.39%)
throttlefree     |  196608 p   768 MB ( 0.29%) |  262144 p  1024 MB ( 0.39%)
pageout_reserve  |  147456 p   576 MB ( 0.22%) |  131072 p   512 MB ( 0.20%)

======================================================================
physmem          | 134217728 p 524288 MB
                 | NEW VALUES                  | ORIGINAL VALUES
lotsfree         |  524288 p  2048 MB ( 0.39%) | 2097152 p  8192 MB ( 1.56%)
desfree          |  262144 p  1024 MB ( 0.20%) | 1048576 p  4096 MB ( 0.78%)
minfree          |  196608 p   768 MB ( 0.15%) |  524288 p  2048 MB ( 0.39%)
throttlefree     |  196608 p   768 MB ( 0.15%) |  524288 p  2048 MB ( 0.39%)
pageout_reserve  |  147456 p   576 MB ( 0.11%) |  262144 p  1024 MB ( 0.20%)

Files

collect_new.sh (1.08 KB) collect_new.sh Joshua M. Clulow, 2021-02-01 12:29 AM
collect_old.sh (976 Bytes) collect_old.sh Joshua M. Clulow, 2021-02-01 12:29 AM
watch_pageout.d (596 Bytes) watch_pageout.d Joshua M. Clulow, 2021-02-01 12:29 AM

Related issues

Related to illumos gate - Bug #13107: pageout should use hrtime instead of lboltClosedJoshua M. Clulow

Actions
Related to illumos gate - Bug #13499: System paging parameters no longer calculated at boot after 13097ClosedAndy Fiddaman

Actions
Actions #1

Updated by Joshua M. Clulow 11 months ago

Note that my current plan is to play with the values until I cannot deadlock the system just by allocating anymore -- especially with the smaller memory virtual machines that are so easy to topple over today.

Actions #2

Updated by Joshua M. Clulow 11 months ago

Notes from a block comment I intend to integrate with this change:

/*
 * FREE MEMORY MANAGEMENT
 *
 * Management of the pool of free pages is a tricky business.  There are
 * several critical threshold values which constrain our allocation of new
 * pages and inform the rate of paging out of memory to swap.  These threshold
 * values, and the behaviour they induce, are described below in descending
 * order of size -- and thus increasing order of severity!
 *
 *   +---------------------------------------------------- physmem (all memory)
 *   |
 *   | Ordinarily there are no particular constraints placed on page
 *   v allocation.  The page scanner is not running and page_create_va()
 *   | will effectively grant all page requests (whether from the kernel
 *   | or from user processes) without artificial delay.
 *   |
 *   +------------------------------- lotsfree (1.56% of physmem, minimum 32MB)
 *   |
 *   | When we have less than "lotsfree" pages, pageout_scanner() is
 *   v signalled by schedpaging() to begin looking for pages that can
 *   | be evicted to disk to bring us back above lotsfree.  At this
 *   | stage there is still no constraint on allocation of free pages.
 *   |
 *   +------------------ desfree (1/2 of lotsfree, 0.78% of physmem, min. 16MB)
 *   |
 *   | When we drop below desfree, a number of kernel facilities will
 *   v wait before allocating more memory, under the assumption that
 *   | pageout or reaping will make progress and free up some memory.
 *   | This behaviour is not especially coordinated; look for comparisons
 *   | of desfree and freemem.
 *   |
 *   | In addition to various attempts at advisory caution, clock()
 *   | will wake up the thread that is ordinarily parked in sched().
 *   | This routine is responsible for the heavy-handed swapping out
 *   v of entire processes in an attempt to arrest the slide of free
 *   | memory.  See comments in sched.c for more details.
 *   |
 *   +---- minfree & throttlefree (3/4 of desfree, 0.59% of physmem, min. 12MB)
 *   |
 *   | These two separate tunables have, by default, the same value.
 *   v Various parts of the kernel use minfree to signal the need for
 *   | more aggressive reclamation of memory, and sched() is more
 *   | aggressive at swapping processes out.
 *   |
 *   | If free memory falls below throttlefree, page_create_va() will
 *   | use page_create_throttle() to begin holding most requests for
 *   | new pages while pageout and reaping free up memory.  Sleeping
 *   v allocations (e.g., KM_SLEEP) are held here while we wait for
 *   | more memory.  Non-sleeping allocations are generally allowed to
 *   | proceed.
 *   |
 *   +------- pageout_reserve (3/4 of throttlefree, 0.44% of physmem, min. 9MB)
 *   |
 *   | When we hit throttlefree, the situation is already dire.  The
 *   v system is generally paging out memory and swapping out entire
 *   | processes in order to free up memory for continued operation.
 *   |
 *   | Unfortunately, evicting memory to disk generally requires short
 *   | term use of additional memory; e.g., allocation of buffers for
 *   | storage drivers, updating maps of free and used blocks, etc.
 *   | As such, pageout_reserve is the number of pages that we keep in
 *   | special reserve for use by pageout() and sched() and by any
 *   v other parts of the kernel that need to be working for those to
 *   | make forward progress such as the ZFS I/O pipeline.
 *   |
 *   | When we are below pageout_reserve, we fail or hold any allocation
 *   | that has not explicitly requested access to the reserve pool.
 *   | Access to the reserve is generally granted via the KM_PUSHPAGE
 *   | flag, or by marking a thread T_PUSHPAGE such that all allocations
 *   | can implicitly tap the reserve.  For more details, see the
 *   v NOMEMWAIT() macro, the T_PUSHPAGE thread flag, the KM_PUSHPAGE
 *   | and VM_PUSHPAGE allocation flags, and page_create_throttle().
 *   |
 *   +---------------------------------------------------------- no free memory
 *   |
 *   | If we have arrived here, things are very bad indeed.  It is
 *   v surprisingly difficult to tell if this condition is even fatal,
 *   | as enough memory may have been granted to pageout() and to the
 *   | ZFS I/O pipeline that requests for eviction that have already been
 *   | made will complete and free up memory some time soon.
 *   |
 *   | If free memory does not materialise, the system generally remains
 *   | deadlocked.  The pageout_deadman() below is run once per second
 *   | from clock(), seeking to limit the amount of time a single request
 *   v to page out can be blocked before the system panics to get a crash
 *   | dump and return to service.
 *   |
 *   +-------------------------------------------------------------------------
 */
Actions #3

Updated by Peter Tribble 11 months ago

So, a couple of thoughts:

How many people are using illumos with ZFS on a 512M system? I can imagine that would be somewhat painful. And if ZFS is problematic, IPS is doubly so. For Tribblix, I know that users down at that end of the spectrum often use UFS. Are the new settings for a 512M system excessive (it seems like an extremely large change to me)?

Is there significant use of systems with no swap device at all? I've occasionally seen this concept promoted. It doesn't apply to me (in the Java world, you want to have significant swap for anonymous reservations even if you never intend to actually write to it).

Actions #4

Updated by Joshua M. Clulow 11 months ago

Those are fair questions. I'm not completely convinced that ZFS isn't doable in 512MB. There are some things that stand in the way of that today, like pageout deadlocks and some of the other issues I've just filed, but I don't yet believe it's impossible. It's pretty cheap for me to look at it as I'm testing this anyway. I have been focused on the 8GB VM I've been using and have been able to deadlock with surprising (to me) ease -- but I'm going to go run the stress tests on smaller VMs too.

IPS isn't inherently impossible, but you do have to keep the repository free of old versions. It helps a lot to be able to swap without the system falling over as well.

When you say you see issues with ZFS on smaller systems, what are they exactly? Hangs? Just slow performance? Not much free memory left for user processes? Something else?

I appreciate the feedback -- I know you've worked pretty extensively on smaller memory footprints with Tribblix in the past.

Actions #5

Updated by Joshua M. Clulow 11 months ago

Oh, and yes, I think a swap file of at least some size is pretty critical in a world without overcommitted memory -- at least if you want to be able to fork!

Actions #6

Updated by Joshua M. Clulow 11 months ago

  • Description updated (diff)
Actions #7

Updated by Joshua M. Clulow 11 months ago

Peter, I have done some more testing in 512MB guests with this fix rebased on the end of the ZFS (and other) changes I have queued up in my pageout branch: https://github.com/illumos/illumos-gate/compare/master...jclulow:pageout

I think I can drop the lotsfree floor from 48MB to 16MB and still be relatively deadlock resistant. I tried 12MB, but had at least one deadlock during testing with that value.

Actions #8

Updated by Peter Tribble 11 months ago

Thanks.

As for small memory systems, I suspect the case of interest is something like AWS (or VirtualBox) where you can vary the size of the instance. I don't think you can install to ZFS onto a 512M system (even with Tribblix, which is pretty light). So the use cases for such small systems are quite different - either an AWS [other providers may be available] nano instance with an existing installed AMI, or an old physical system. I'm slightly wary of optimizing the one case at the expense of the other.

Actions #9

Updated by Electric Monk 11 months ago

  • Gerrit CR set to 890
Actions #10

Updated by Rich Lowe 11 months ago

Peter Tribble wrote:

Thanks.

As for small memory systems, I suspect the case of interest is something like AWS (or VirtualBox) where you can vary the size of the instance. I don't think you can install to ZFS onto a 512M system (even with Tribblix, which is pretty light). So the use cases for such small systems are quite different - either an AWS [other providers may be available] nano instance with an existing installed AMI, or an old physical system. I'm slightly wary of optimizing the one case at the expense of the other.

I don't think that's a risk, and I think with these changes you could install into 512M the whole point here is that the reasons things lock up in such cases is poor tuning and increased allocation in the pageout path. With an installer that configured swap for itself early (as would be wise), it has as much chance of success as anything.

The major difference would be an installer may be running in part out of a ramdisk, further increasing memory pressure. That wouldn't be affected by changes such as these (unless the ramdisk code is poor).

Actions #11

Updated by Joshua M. Clulow 6 months ago

Testing Notes

On a physical system with 32GB of memory, I compared tunables before and after:

--- newcastle.old.txt   Fri Jan 22 22:31:15 2021
+++ newcastle.new.txt   Fri Jan 22 22:38:59 2021
@@ -1,16 +1,16 @@
 hz                                     1000
 min_percent_cpu                           4
 max_percent_cpu                          80
-min_pageout_ticks                        10 ticks         10000000 nsec
-max_pageout_ticks                       200 ticks        200000000 nsec
-pageout_ticks                            10 ticks         10000000 nsec
-pageout_new_spread                  4000881
+min_pageout_nsec                                          10000000 nsec
+max_pageout_nsec                                         200000000 nsec
+pageout_nsec                                              10000000 nsec
+pageout_new_spread                  4128280
 pageout_sample_lim                        4
 pageout_sample_cnt                        4
 physmem                             8358159 pages  32649 megabytes
 lotsfree                             130596 pages    510 megabytes
 desfree                               65298 pages    255 megabytes
-minfree                               32649 pages    127 megabytes
-throttlefree                          32649 pages    127 megabytes
-pageout_reserve                       16324 pages     63 megabytes
-freemem                             2759062 pages  10777 megabytes
+minfree                               48973 pages    191 megabytes
+throttlefree                          48973 pages    191 megabytes
+pageout_reserve                       36729 pages    143 megabytes
+freemem                             7558810 pages  29526 megabytes

(NB: pageout_new_spread varies a little each boot, as it depends on what kind of background activity was occuring while we were doing the sample scans)

On a virtual system with 4GB of memory:

--- oi0.4gb.old.txt    2021-01-31 14:13:04.231055718 -0800
+++ oi0.4gb.new.txt    2021-01-31 13:06:56.627164966 -0800
@@ -1,16 +1,16 @@
 hz                                     1000
 min_percent_cpu                           4
 max_percent_cpu                          80
-min_pageout_ticks                        10 ticks         10000000 nsec
-max_pageout_ticks                       200 ticks        200000000 nsec
-pageout_ticks                            10 ticks         10000000 nsec
-pageout_new_spread                   276630
+min_pageout_nsec                                          10000000 nsec
+max_pageout_nsec                                         200000000 nsec
+pageout_nsec                                              10000000 nsec
+pageout_new_spread                   311138
 pageout_sample_lim                        4
 pageout_sample_cnt                        4
 physmem                             1046423 pages   4087 megabytes
 lotsfree                              16350 pages     63 megabytes
 desfree                                8175 pages     31 megabytes
-minfree                                4087 pages     15 megabytes
-throttlefree                           4087 pages     15 megabytes
-pageout_reserve                        2043 pages      7 megabytes
-freemem                              834893 pages   3261 megabytes
+minfree                                6131 pages     23 megabytes
+throttlefree                           6131 pages     23 megabytes
+pageout_reserve                        4598 pages     17 megabytes
+freemem                              818660 pages   3197 megabytes

On a virtual system with 512MB of memory, noting that lotsfree is clamped to the new minimum of 16MB:

--- oi0.512m.old.txt    2021-01-31 14:22:15.888283688 -0800
+++ oi0.512m.new.txt    2021-01-31 14:25:11.650038652 -0800
@@ -1,16 +1,16 @@
 hz                                     1000
 min_percent_cpu                           4
 max_percent_cpu                          80
-min_pageout_ticks                        10 ticks         10000000 nsec
-max_pageout_ticks                       200 ticks        200000000 nsec
-pageout_ticks                            10 ticks         10000000 nsec
-pageout_new_spread                   269034
+min_pageout_nsec                                          10000000 nsec
+max_pageout_nsec                                         200000000 nsec
+pageout_nsec                                              10000000 nsec
+pageout_new_spread                   331429
 pageout_sample_lim                        4
 pageout_sample_cnt                        4
 physmem                              128919 pages    503 megabytes
-lotsfree                               2014 pages      7 megabytes
-desfree                                1007 pages      3 megabytes
-minfree                                 503 pages      1 megabytes
-throttlefree                            503 pages      1 megabytes
-pageout_reserve                         251 pages      0 megabytes
-freemem                               18685 pages     72 megabytes
+lotsfree                               4096 pages     16 megabytes
+desfree                                2048 pages      8 megabytes
+minfree                                1536 pages      6 megabytes
+throttlefree                           1536 pages      6 megabytes
+pageout_reserve                        1152 pages      4 megabytes
+freemem                                9592 pages     37 megabytes

I also wrote a script to monitor the page scanner. In the 4GB virtual machine, I ran a program that will allocate a specified number of megabytes of memory and repeatedly read and write them in an attempt to keep them resident. You can see freemem fall away and then the scanner kicks in:

---------------- freemem 812257 (3172 MB)
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
---------------- freemem 623318 (2434 MB)
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
---------------- freemem 435509 (1701 MB)
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
---------------- freemem 245787 (960 MB)
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
---------------- freemem 50543 (197 MB)
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000
cycle, 10536423 ns (5% of max) 0 laps 19688 scanned
~~~ SCHEDPAGING; desscan 19688 fastscan 134925 pageout_nsec 120839143
cycle, 17914686 ns (8% of max) 0 laps 15703 scanned
~~~ SCHEDPAGING; desscan 15703 fastscan 134925 pageout_nsec 98376146
cycle, 17057218 ns (8% of max) 0 laps 15703 scanned
---------------- freemem 8745 (34 MB)
~~~ SCHEDPAGING; desscan 15703 fastscan 134925 pageout_nsec 98376146
cycle, 17609093 ns (8% of max) 0 laps 15703 scanned
~~~ SCHEDPAGING; desscan 15703 fastscan 134925 pageout_nsec 98376146
cycle, 116927891 ns (58% of max) 0 laps 10238 scanned
~~~ SCHEDPAGING; desscan 15703 fastscan 134925 pageout_nsec 98376146
cycle, 37601707 ns (18% of max) 0 laps 15703 scanned
~~~ SCHEDPAGING; desscan 15703 fastscan 134925 pageout_nsec 98376146
cycle, 16310618 ns (8% of max) 0 laps 15703 scanned
---------------- freemem 8745 (34 MB)
~~~ SCHEDPAGING; desscan 15703 fastscan 134925 pageout_nsec 98376146
cycle, 63880666 ns (31% of max) 0 laps 15703 scanned
~~~ SCHEDPAGING; desscan 15703 fastscan 134925 pageout_nsec 98376146
cycle, 19270634 ns (9% of max) 0 laps 15700 scanned
~~~ SCHEDPAGING; desscan 15700 fastscan 134925 pageout_nsec 98364525
cycle, 96500577 ns (48% of max) 0 laps 15700 scanned
~~~ SCHEDPAGING; desscan 15700 fastscan 134925 pageout_nsec 98364525
cycle, 22501728 ns (11% of max) 0 laps 15700 scanned
---------------- freemem 8746 (34 MB)
~~~ SCHEDPAGING; desscan 15700 fastscan 134925 pageout_nsec 98364525
cycle, 99393584 ns (49% of max) 0 laps 10238 scanned
~~~ SCHEDPAGING; desscan 15700 fastscan 134925 pageout_nsec 98364525
cycle, 28422239 ns (14% of max) 0 laps 15700 scanned
~~~ SCHEDPAGING; desscan 15700 fastscan 134925 pageout_nsec 98364525
cycle, 19333560 ns (9% of max) 0 laps 15700 scanned
~~~ SCHEDPAGING; desscan 15700 fastscan 134925 pageout_nsec 98364525
cycle, 26420012 ns (13% of max) 0 laps 15700 scanned
---------------- freemem 9831 (38 MB)
~~~ SCHEDPAGING; desscan 15700 fastscan 134925 pageout_nsec 98364525
cycle, 13944024 ns (6% of max) 0 laps 13464 scanned
~~~ SCHEDPAGING; desscan 13464 fastscan 134925 pageout_nsec 85755963
cycle, 18586381 ns (9% of max) 0 laps 13464 scanned
~~~ SCHEDPAGING; desscan 13464 fastscan 134925 pageout_nsec 85755963
cycle, 85944003 ns (42% of max) 0 laps 13309 scanned
~~~ SCHEDPAGING; desscan 13464 fastscan 134925 pageout_nsec 85755963
cycle, 7476417 ns (3% of max) 0 laps 13464 scanned
---------------- freemem 9831 (38 MB)
~~~ SCHEDPAGING; desscan 13464 fastscan 134925 pageout_nsec 85755963
cycle, 104926380 ns (52% of max) 0 laps 8189 scanned
~~~ SCHEDPAGING; desscan 13464 fastscan 134925 pageout_nsec 85755963
cycle, 23908288 ns (11% of max) 0 laps 13464 scanned
~~~ SCHEDPAGING; desscan 13464 fastscan 134925 pageout_nsec 85755963
cycle, 23343399 ns (11% of max) 0 laps 13464 scanned
~~~ SCHEDPAGING; desscan 13464 fastscan 134925 pageout_nsec 85755963
cycle, 15711620 ns (7% of max) 0 laps 11744 scanned
---------------- freemem 10665 (41 MB)
~~~ SCHEDPAGING; desscan 11744 fastscan 134925 pageout_nsec 76064220
cycle, 94039807 ns (47% of max) 0 laps 11263 scanned
~~~ SCHEDPAGING; desscan 11744 fastscan 134925 pageout_nsec 76064220
cycle, 24910059 ns (12% of max) 0 laps 11744 scanned
~~~ SCHEDPAGING; desscan 11744 fastscan 134925 pageout_nsec 76064220
cycle, 15823486 ns (7% of max) 0 laps 11744 scanned
~~~ SCHEDPAGING; desscan 11744 fastscan 134925 pageout_nsec 76064220
cycle, 15632502 ns (7% of max) 0 laps 11705 scanned
---------------- freemem 10686 (41 MB)
~~~ SCHEDPAGING; desscan 11705 fastscan 134925 pageout_nsec 75843425
cycle, 65525953 ns (32% of max) 0 laps 11701 scanned
~~~ SCHEDPAGING; desscan 11701 fastscan 134925 pageout_nsec 75820183
cycle, 21883678 ns (10% of max) 0 laps 11699 scanned
~~~ SCHEDPAGING; desscan 11699 fastscan 134925 pageout_nsec 75808562
cycle, 15060435 ns (7% of max) 0 laps 11511 scanned
~~~ SCHEDPAGING; desscan 11511 fastscan 134925 pageout_nsec 74751070
cycle, 15139263 ns (7% of max) 0 laps 11511 scanned
---------------- freemem 10778 (42 MB)
~~~ SCHEDPAGING; desscan 11511 fastscan 134925 pageout_nsec 74751070
cycle, 87005340 ns (43% of max) 0 laps 5119 scanned
~~~ SCHEDPAGING; desscan 11511 fastscan 134925 pageout_nsec 74751070
cycle, 14926752 ns (7% of max) 0 laps 11511 scanned
~~~ SCHEDPAGING; desscan 11511 fastscan 134925 pageout_nsec 74751070
cycle, 78493933 ns (39% of max) 0 laps 5119 scanned
~~~ SCHEDPAGING; desscan 11505 fastscan 134925 pageout_nsec 74716207
cycle, 15051959 ns (7% of max) 0 laps 11505 scanned
---------------- freemem 10781 (42 MB)
~~~ SCHEDPAGING; desscan 11505 fastscan 134925 pageout_nsec 74716207
cycle, 15107000 ns (7% of max) 0 laps 11505 scanned
~~~ SCHEDPAGING; desscan 11505 fastscan 134925 pageout_nsec 74716207
cycle, 14875161 ns (7% of max) 0 laps 11505 scanned
~~~ SCHEDPAGING; desscan 11505 fastscan 134925 pageout_nsec 74716207
cycle, 84702829 ns (42% of max) 0 laps 6136 scanned
~~~ SCHEDPAGING; desscan 11503 fastscan 134925 pageout_nsec 74704587
cycle, 15585797 ns (7% of max) 0 laps 11503 scanned
---------------- freemem 10782 (42 MB)
~~~ SCHEDPAGING; desscan 11503 fastscan 134925 pageout_nsec 74704587
cycle, 86954202 ns (43% of max) 0 laps 9207 scanned
~~~ SCHEDPAGING; desscan 11503 fastscan 134925 pageout_nsec 74704587
cycle, 10146716 ns (5% of max) 0 laps 11499 scanned
~~~ SCHEDPAGING; desscan 11499 fastscan 134925 pageout_nsec 74681345
cycle, 36940001 ns (18% of max) 0 laps 11396 scanned
~~~ SCHEDPAGING; desscan 11396 fastscan 134925 pageout_nsec 74100305
cycle, 13753410 ns (6% of max) 0 laps 12080 scanned
---------------- freemem 10063 (39 MB)
~~~ SCHEDPAGING; desscan 12080 fastscan 134925 pageout_nsec 77958409
cycle, 47155417 ns (23% of max) 0 laps 12985 scanned
~~~ SCHEDPAGING; desscan 12985 fastscan 134925 pageout_nsec 83059938
cycle, 13984462 ns (6% of max) 0 laps 13616 scanned
~~~ SCHEDPAGING; desscan 13616 fastscan 134925 pageout_nsec 86615902
cycle, 13435323 ns (6% of max) 0 laps 14315 scanned
~~~ SCHEDPAGING; desscan 14315 fastscan 134925 pageout_nsec 90555351
cycle, 12988078 ns (6% of max) 0 laps 18137 scanned
---------------- freemem 7382 (28 MB)
~~~ SCHEDPAGING; desscan 18137 fastscan 134925 pageout_nsec 112100305
cycle, 12479328 ns (6% of max) 0 laps 18527 scanned
~~~ SCHEDPAGING; desscan 18527 fastscan 134925 pageout_nsec 114296636
cycle, 39049705 ns (19% of max) 0 laps 18714 scanned
~~~ SCHEDPAGING; desscan 18714 fastscan 134925 pageout_nsec 115354128
cycle, 13144154 ns (6% of max) 0 laps 10349 scanned
~~~ SCHEDPAGING; desscan 10349 fastscan 134925 pageout_nsec 68196941
cycle, 14842170 ns (7% of max) 0 laps 10357 scanned
---------------- freemem 11016 (43 MB)
~~~ SCHEDPAGING; desscan 10357 fastscan 134925 pageout_nsec 68243425
cycle, 69425172 ns (34% of max) 0 laps 11023 scanned
~~~ SCHEDPAGING; desscan 11023 fastscan 134925 pageout_nsec 71996941
cycle, 17262974 ns (8% of max) 0 laps 13734 scanned
~~~ SCHEDPAGING; desscan 13734 fastscan 134925 pageout_nsec 87278287
cycle, 86739555 ns (43% of max) 0 laps 15160 scanned
~~~ SCHEDPAGING; desscan 15160 fastscan 134925 pageout_nsec 95319877
cycle, 15065196 ns (7% of max) 0 laps 15892 scanned
---------------- freemem 8717 (34 MB)
~~~ SCHEDPAGING; desscan 15892 fastscan 134925 pageout_nsec 99445259
cycle, 97486181 ns (48% of max) 0 laps 15760 scanned
~~~ SCHEDPAGING; desscan 15760 fastscan 134925 pageout_nsec 98701529
cycle, 19923485 ns (9% of max) 0 laps 15946 scanned
~~~ SCHEDPAGING; desscan 15946 fastscan 134925 pageout_nsec 99747400
cycle, 17100867 ns (8% of max) 0 laps 16760 scanned
~~~ SCHEDPAGING; desscan 16760 fastscan 134925 pageout_nsec 104337614
cycle, 19420313 ns (9% of max) 0 laps 17457 scanned
---------------- freemem 7529 (29 MB)
~~~ SCHEDPAGING; desscan 17457 fastscan 134925 pageout_nsec 108265443
cycle, 102923784 ns (51% of max) 0 laps 18209 scanned
~~~ SCHEDPAGING; desscan 18209 fastscan 134925 pageout_nsec 112507033
cycle, 24365923 ns (12% of max) 0 laps 18418 scanned
~~~ SCHEDPAGING; desscan 18418 fastscan 134925 pageout_nsec 113680733
cycle, 16886613 ns (8% of max) 0 laps 18762 scanned
~~~ SCHEDPAGING; desscan 18762 fastscan 134925 pageout_nsec 115621406
cycle, 22114961 ns (11% of max) 0 laps 19228 scanned
...

The target duty cycle bounces around with freemem. I was not able to push the memory situation harder, because until at least #13092 and #13093 are integrated it's still very easy to cork the system when you have low freemem and high paging activity.

Actions #12

Updated by Joshua M. Clulow 6 months ago

I've attached the scripts I used to collect tunables before and after the change, and for basic monitoring of the scanner.

Actions #13

Updated by Joshua M. Clulow 6 months ago

  • Related to Bug #13107: pageout should use hrtime instead of lbolt added
Actions #14

Updated by Electric Monk 6 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 2d9166aee5315081107056f3d663e4adee4c1c2a

commit  2d9166aee5315081107056f3d663e4adee4c1c2a
Author: Joshua M. Clulow <josh@sysmgr.org>
Date:   2021-02-03T00:41:05.000Z

    13097 improve VM tunables for modern systems
    13107 pageout should use hrtime instead of lbolt
    Reviewed by: Robert Mustacchi <rm@fingolfin.org>
    Reviewed by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Actions #15

Updated by Marcel Telka 6 months ago

  • Related to Bug #13499: System paging parameters no longer calculated at boot after 13097 added
Actions

Also available in: Atom PDF