Bug #13097
closedimprove VM tunables for modern systems
100%
Description
Modern systems tend to be configured to swap to a ZFS zvol device. ZFS pool I/O requires a lot more (mostly transient) allocations when compared to more traditional swap targets like UFS-backed files or even raw disk devices. In addition to this, modern systems are quite fast at allocating and consuming memory. It seems that the only hard stop on the provision of pages from the free and cache lists to users is throttlefree
, and the only hard stop on allocation to the reset of the kernel outside of the pageout()
machinery is pageout_reserve
.
It is remarkably easy for a current illumos system to end up deadlocked trying to page out, with even a single Python thread allocating as fast as possible. In a 1GB system, pageout_reserve
is a little under 2MB, and throttlefree
is a little under 4MB. ZFS requires occasional larger bursts of allocation (e.g., when loading metaslabs, adjusting range trees, compressing blocks, assembling I/O buffers for the trip to disk, etc). 2MB is not enough slack to ensure that pageout()
and ZFS can work together to steadily evict memory to disk in these low memory situations.
Even on a larger 8GB system, where pageout_reserve
is 16MB today (and throttlefree
is 32MB) we can relatively easily hit the deadlock condition, especially if when we get unlucky and things like kmem reaping and ARC reduction don't happen in the correct sequence.
Part of the issue is that while lotsfree
seems large enough, the actual harder stops are only at the lower thresholds -- which are halved and halved and halved again from there. Instead of making each subsequent threshold a half of the next threshold up, it seems we could have a shallower ramp; e.g., some of the thresholds could be 3/4 of the size of the threshold above. Indeed, we could also mandate a larger minimum lotsfree
of something like 16MB so that even 1GB systems will have around 4MB of reserve pool for pageout.
Finally, for larger systems we probably set too much memory aside! Once pageout_reserve
is high enough to make forward progress, the requirement for memory would seem to be more tightly related to the rate at which pageout can or must occur -- rather than the overall size of memory. On a 512GB system, fully 2GB is below throttlefree
-- which seems like a lot! We should consider capping lotsfree
, or at least greatly detuning its growth, above a certain physical memory size.
Proposed Change¶
lotsfree
remains 1/64 of physical memory, but is clamped in the range 16MB through 2048MBminfree
(and thus alsothrottlefree
) changes to be 3/4 ofdesfree
pageout_reserve
changes to be 3/4 ofthrottlefree
A comparison of original sizes and proposed new sizes for these values with a range of system memory sizes appears below:
====================================================================== physmem | 131072 p 512 MB | NEW VALUES | ORIGINAL VALUES lotsfree | 4096 p 16 MB ( 3.12%) | 2048 p 8 MB ( 1.56%) desfree | 2048 p 8 MB ( 1.56%) | 1024 p 4 MB ( 0.78%) minfree | 1536 p 6 MB ( 1.17%) | 512 p 2 MB ( 0.39%) throttlefree | 1536 p 6 MB ( 1.17%) | 512 p 2 MB ( 0.39%) pageout_reserve | 1152 p 4 MB ( 0.88%) | 256 p 1 MB ( 0.20%) ====================================================================== physmem | 262144 p 1024 MB | NEW VALUES | ORIGINAL VALUES lotsfree | 4096 p 16 MB ( 1.56%) | 4096 p 16 MB ( 1.56%) desfree | 2048 p 8 MB ( 0.78%) | 2048 p 8 MB ( 0.78%) minfree | 1536 p 6 MB ( 0.59%) | 1024 p 4 MB ( 0.39%) throttlefree | 1536 p 6 MB ( 0.59%) | 1024 p 4 MB ( 0.39%) pageout_reserve | 1152 p 4 MB ( 0.44%) | 512 p 2 MB ( 0.20%) ====================================================================== physmem | 524288 p 2048 MB | NEW VALUES | ORIGINAL VALUES lotsfree | 8192 p 32 MB ( 1.56%) | 8192 p 32 MB ( 1.56%) desfree | 4096 p 16 MB ( 0.78%) | 4096 p 16 MB ( 0.78%) minfree | 3072 p 12 MB ( 0.59%) | 2048 p 8 MB ( 0.39%) throttlefree | 3072 p 12 MB ( 0.59%) | 2048 p 8 MB ( 0.39%) pageout_reserve | 2304 p 9 MB ( 0.44%) | 1024 p 4 MB ( 0.20%) ====================================================================== physmem | 1048576 p 4096 MB | NEW VALUES | ORIGINAL VALUES lotsfree | 16384 p 64 MB ( 1.56%) | 16384 p 64 MB ( 1.56%) desfree | 8192 p 32 MB ( 0.78%) | 8192 p 32 MB ( 0.78%) minfree | 6144 p 24 MB ( 0.59%) | 4096 p 16 MB ( 0.39%) throttlefree | 6144 p 24 MB ( 0.59%) | 4096 p 16 MB ( 0.39%) pageout_reserve | 4608 p 18 MB ( 0.44%) | 2048 p 8 MB ( 0.20%) ====================================================================== physmem | 2097152 p 8192 MB | NEW VALUES | ORIGINAL VALUES lotsfree | 32768 p 128 MB ( 1.56%) | 32768 p 128 MB ( 1.56%) desfree | 16384 p 64 MB ( 0.78%) | 16384 p 64 MB ( 0.78%) minfree | 12288 p 48 MB ( 0.59%) | 8192 p 32 MB ( 0.39%) throttlefree | 12288 p 48 MB ( 0.59%) | 8192 p 32 MB ( 0.39%) pageout_reserve | 9216 p 36 MB ( 0.44%) | 4096 p 16 MB ( 0.20%) ====================================================================== physmem | 4194304 p 16384 MB | NEW VALUES | ORIGINAL VALUES lotsfree | 65536 p 256 MB ( 1.56%) | 65536 p 256 MB ( 1.56%) desfree | 32768 p 128 MB ( 0.78%) | 32768 p 128 MB ( 0.78%) minfree | 24576 p 96 MB ( 0.59%) | 16384 p 64 MB ( 0.39%) throttlefree | 24576 p 96 MB ( 0.59%) | 16384 p 64 MB ( 0.39%) pageout_reserve | 18432 p 72 MB ( 0.44%) | 8192 p 32 MB ( 0.20%) ====================================================================== physmem | 8388608 p 32768 MB | NEW VALUES | ORIGINAL VALUES lotsfree | 131072 p 512 MB ( 1.56%) | 131072 p 512 MB ( 1.56%) desfree | 65536 p 256 MB ( 0.78%) | 65536 p 256 MB ( 0.78%) minfree | 49152 p 192 MB ( 0.59%) | 32768 p 128 MB ( 0.39%) throttlefree | 49152 p 192 MB ( 0.59%) | 32768 p 128 MB ( 0.39%) pageout_reserve | 36864 p 144 MB ( 0.44%) | 16384 p 64 MB ( 0.20%) ====================================================================== physmem | 16777216 p 65536 MB | NEW VALUES | ORIGINAL VALUES lotsfree | 262144 p 1024 MB ( 1.56%) | 262144 p 1024 MB ( 1.56%) desfree | 131072 p 512 MB ( 0.78%) | 131072 p 512 MB ( 0.78%) minfree | 98304 p 384 MB ( 0.59%) | 65536 p 256 MB ( 0.39%) throttlefree | 98304 p 384 MB ( 0.59%) | 65536 p 256 MB ( 0.39%) pageout_reserve | 73728 p 288 MB ( 0.44%) | 32768 p 128 MB ( 0.20%) ====================================================================== physmem | 33554432 p 131072 MB | NEW VALUES | ORIGINAL VALUES lotsfree | 524288 p 2048 MB ( 1.56%) | 524288 p 2048 MB ( 1.56%) desfree | 262144 p 1024 MB ( 0.78%) | 262144 p 1024 MB ( 0.78%) minfree | 196608 p 768 MB ( 0.59%) | 131072 p 512 MB ( 0.39%) throttlefree | 196608 p 768 MB ( 0.59%) | 131072 p 512 MB ( 0.39%) pageout_reserve | 147456 p 576 MB ( 0.44%) | 65536 p 256 MB ( 0.20%) ====================================================================== physmem | 67108864 p 262144 MB | NEW VALUES | ORIGINAL VALUES lotsfree | 524288 p 2048 MB ( 0.78%) | 1048576 p 4096 MB ( 1.56%) desfree | 262144 p 1024 MB ( 0.39%) | 524288 p 2048 MB ( 0.78%) minfree | 196608 p 768 MB ( 0.29%) | 262144 p 1024 MB ( 0.39%) throttlefree | 196608 p 768 MB ( 0.29%) | 262144 p 1024 MB ( 0.39%) pageout_reserve | 147456 p 576 MB ( 0.22%) | 131072 p 512 MB ( 0.20%) ====================================================================== physmem | 134217728 p 524288 MB | NEW VALUES | ORIGINAL VALUES lotsfree | 524288 p 2048 MB ( 0.39%) | 2097152 p 8192 MB ( 1.56%) desfree | 262144 p 1024 MB ( 0.20%) | 1048576 p 4096 MB ( 0.78%) minfree | 196608 p 768 MB ( 0.15%) | 524288 p 2048 MB ( 0.39%) throttlefree | 196608 p 768 MB ( 0.15%) | 524288 p 2048 MB ( 0.39%) pageout_reserve | 147456 p 576 MB ( 0.11%) | 262144 p 1024 MB ( 0.20%)
Files
Related issues
Updated by Joshua M. Clulow over 2 years ago
Note that my current plan is to play with the values until I cannot deadlock the system just by allocating anymore -- especially with the smaller memory virtual machines that are so easy to topple over today.
Updated by Joshua M. Clulow over 2 years ago
Notes from a block comment I intend to integrate with this change:
/* * FREE MEMORY MANAGEMENT * * Management of the pool of free pages is a tricky business. There are * several critical threshold values which constrain our allocation of new * pages and inform the rate of paging out of memory to swap. These threshold * values, and the behaviour they induce, are described below in descending * order of size -- and thus increasing order of severity! * * +---------------------------------------------------- physmem (all memory) * | * | Ordinarily there are no particular constraints placed on page * v allocation. The page scanner is not running and page_create_va() * | will effectively grant all page requests (whether from the kernel * | or from user processes) without artificial delay. * | * +------------------------------- lotsfree (1.56% of physmem, minimum 32MB) * | * | When we have less than "lotsfree" pages, pageout_scanner() is * v signalled by schedpaging() to begin looking for pages that can * | be evicted to disk to bring us back above lotsfree. At this * | stage there is still no constraint on allocation of free pages. * | * +------------------ desfree (1/2 of lotsfree, 0.78% of physmem, min. 16MB) * | * | When we drop below desfree, a number of kernel facilities will * v wait before allocating more memory, under the assumption that * | pageout or reaping will make progress and free up some memory. * | This behaviour is not especially coordinated; look for comparisons * | of desfree and freemem. * | * | In addition to various attempts at advisory caution, clock() * | will wake up the thread that is ordinarily parked in sched(). * | This routine is responsible for the heavy-handed swapping out * v of entire processes in an attempt to arrest the slide of free * | memory. See comments in sched.c for more details. * | * +---- minfree & throttlefree (3/4 of desfree, 0.59% of physmem, min. 12MB) * | * | These two separate tunables have, by default, the same value. * v Various parts of the kernel use minfree to signal the need for * | more aggressive reclamation of memory, and sched() is more * | aggressive at swapping processes out. * | * | If free memory falls below throttlefree, page_create_va() will * | use page_create_throttle() to begin holding most requests for * | new pages while pageout and reaping free up memory. Sleeping * v allocations (e.g., KM_SLEEP) are held here while we wait for * | more memory. Non-sleeping allocations are generally allowed to * | proceed. * | * +------- pageout_reserve (3/4 of throttlefree, 0.44% of physmem, min. 9MB) * | * | When we hit throttlefree, the situation is already dire. The * v system is generally paging out memory and swapping out entire * | processes in order to free up memory for continued operation. * | * | Unfortunately, evicting memory to disk generally requires short * | term use of additional memory; e.g., allocation of buffers for * | storage drivers, updating maps of free and used blocks, etc. * | As such, pageout_reserve is the number of pages that we keep in * | special reserve for use by pageout() and sched() and by any * v other parts of the kernel that need to be working for those to * | make forward progress such as the ZFS I/O pipeline. * | * | When we are below pageout_reserve, we fail or hold any allocation * | that has not explicitly requested access to the reserve pool. * | Access to the reserve is generally granted via the KM_PUSHPAGE * | flag, or by marking a thread T_PUSHPAGE such that all allocations * | can implicitly tap the reserve. For more details, see the * v NOMEMWAIT() macro, the T_PUSHPAGE thread flag, the KM_PUSHPAGE * | and VM_PUSHPAGE allocation flags, and page_create_throttle(). * | * +---------------------------------------------------------- no free memory * | * | If we have arrived here, things are very bad indeed. It is * v surprisingly difficult to tell if this condition is even fatal, * | as enough memory may have been granted to pageout() and to the * | ZFS I/O pipeline that requests for eviction that have already been * | made will complete and free up memory some time soon. * | * | If free memory does not materialise, the system generally remains * | deadlocked. The pageout_deadman() below is run once per second * | from clock(), seeking to limit the amount of time a single request * v to page out can be blocked before the system panics to get a crash * | dump and return to service. * | * +------------------------------------------------------------------------- */
Updated by Peter Tribble over 2 years ago
So, a couple of thoughts:
How many people are using illumos with ZFS on a 512M system? I can imagine that would be somewhat painful. And if ZFS is problematic, IPS is doubly so. For Tribblix, I know that users down at that end of the spectrum often use UFS. Are the new settings for a 512M system excessive (it seems like an extremely large change to me)?
Is there significant use of systems with no swap device at all? I've occasionally seen this concept promoted. It doesn't apply to me (in the Java world, you want to have significant swap for anonymous reservations even if you never intend to actually write to it).
Updated by Joshua M. Clulow over 2 years ago
Those are fair questions. I'm not completely convinced that ZFS isn't doable in 512MB. There are some things that stand in the way of that today, like pageout deadlocks and some of the other issues I've just filed, but I don't yet believe it's impossible. It's pretty cheap for me to look at it as I'm testing this anyway. I have been focused on the 8GB VM I've been using and have been able to deadlock with surprising (to me) ease -- but I'm going to go run the stress tests on smaller VMs too.
IPS isn't inherently impossible, but you do have to keep the repository free of old versions. It helps a lot to be able to swap without the system falling over as well.
When you say you see issues with ZFS on smaller systems, what are they exactly? Hangs? Just slow performance? Not much free memory left for user processes? Something else?
I appreciate the feedback -- I know you've worked pretty extensively on smaller memory footprints with Tribblix in the past.
Updated by Joshua M. Clulow over 2 years ago
Oh, and yes, I think a swap file of at least some size is pretty critical in a world without overcommitted memory -- at least if you want to be able to fork!
Updated by Joshua M. Clulow over 2 years ago
Peter, I have done some more testing in 512MB guests with this fix rebased on the end of the ZFS (and other) changes I have queued up in my pageout branch: https://github.com/illumos/illumos-gate/compare/master...jclulow:pageout
I think I can drop the lotsfree
floor from 48MB to 16MB and still be relatively deadlock resistant. I tried 12MB, but had at least one deadlock during testing with that value.
Updated by Peter Tribble over 2 years ago
Thanks.
As for small memory systems, I suspect the case of interest is something like AWS (or VirtualBox) where you can vary the size of the instance. I don't think you can install to ZFS onto a 512M system (even with Tribblix, which is pretty light). So the use cases for such small systems are quite different - either an AWS [other providers may be available] nano instance with an existing installed AMI, or an old physical system. I'm slightly wary of optimizing the one case at the expense of the other.
Updated by Rich Lowe over 2 years ago
Peter Tribble wrote:
Thanks.
As for small memory systems, I suspect the case of interest is something like AWS (or VirtualBox) where you can vary the size of the instance. I don't think you can install to ZFS onto a 512M system (even with Tribblix, which is pretty light). So the use cases for such small systems are quite different - either an AWS [other providers may be available] nano instance with an existing installed AMI, or an old physical system. I'm slightly wary of optimizing the one case at the expense of the other.
I don't think that's a risk, and I think with these changes you could install into 512M the whole point here is that the reasons things lock up in such cases is poor tuning and increased allocation in the pageout path. With an installer that configured swap for itself early (as would be wise), it has as much chance of success as anything.
The major difference would be an installer may be running in part out of a ramdisk, further increasing memory pressure. That wouldn't be affected by changes such as these (unless the ramdisk code is poor).
Updated by Joshua M. Clulow about 2 years ago
Testing Notes¶
On a physical system with 32GB of memory, I compared tunables before and after:
--- newcastle.old.txt Fri Jan 22 22:31:15 2021 +++ newcastle.new.txt Fri Jan 22 22:38:59 2021 @@ -1,16 +1,16 @@ hz 1000 min_percent_cpu 4 max_percent_cpu 80 -min_pageout_ticks 10 ticks 10000000 nsec -max_pageout_ticks 200 ticks 200000000 nsec -pageout_ticks 10 ticks 10000000 nsec -pageout_new_spread 4000881 +min_pageout_nsec 10000000 nsec +max_pageout_nsec 200000000 nsec +pageout_nsec 10000000 nsec +pageout_new_spread 4128280 pageout_sample_lim 4 pageout_sample_cnt 4 physmem 8358159 pages 32649 megabytes lotsfree 130596 pages 510 megabytes desfree 65298 pages 255 megabytes -minfree 32649 pages 127 megabytes -throttlefree 32649 pages 127 megabytes -pageout_reserve 16324 pages 63 megabytes -freemem 2759062 pages 10777 megabytes +minfree 48973 pages 191 megabytes +throttlefree 48973 pages 191 megabytes +pageout_reserve 36729 pages 143 megabytes +freemem 7558810 pages 29526 megabytes
(NB: pageout_new_spread
varies a little each boot, as it depends on what kind of background activity was occuring while we were doing the sample scans)
On a virtual system with 4GB of memory:
--- oi0.4gb.old.txt 2021-01-31 14:13:04.231055718 -0800 +++ oi0.4gb.new.txt 2021-01-31 13:06:56.627164966 -0800 @@ -1,16 +1,16 @@ hz 1000 min_percent_cpu 4 max_percent_cpu 80 -min_pageout_ticks 10 ticks 10000000 nsec -max_pageout_ticks 200 ticks 200000000 nsec -pageout_ticks 10 ticks 10000000 nsec -pageout_new_spread 276630 +min_pageout_nsec 10000000 nsec +max_pageout_nsec 200000000 nsec +pageout_nsec 10000000 nsec +pageout_new_spread 311138 pageout_sample_lim 4 pageout_sample_cnt 4 physmem 1046423 pages 4087 megabytes lotsfree 16350 pages 63 megabytes desfree 8175 pages 31 megabytes -minfree 4087 pages 15 megabytes -throttlefree 4087 pages 15 megabytes -pageout_reserve 2043 pages 7 megabytes -freemem 834893 pages 3261 megabytes +minfree 6131 pages 23 megabytes +throttlefree 6131 pages 23 megabytes +pageout_reserve 4598 pages 17 megabytes +freemem 818660 pages 3197 megabytes
On a virtual system with 512MB of memory, noting that lotsfree
is clamped to the new minimum of 16MB:
--- oi0.512m.old.txt 2021-01-31 14:22:15.888283688 -0800 +++ oi0.512m.new.txt 2021-01-31 14:25:11.650038652 -0800 @@ -1,16 +1,16 @@ hz 1000 min_percent_cpu 4 max_percent_cpu 80 -min_pageout_ticks 10 ticks 10000000 nsec -max_pageout_ticks 200 ticks 200000000 nsec -pageout_ticks 10 ticks 10000000 nsec -pageout_new_spread 269034 +min_pageout_nsec 10000000 nsec +max_pageout_nsec 200000000 nsec +pageout_nsec 10000000 nsec +pageout_new_spread 331429 pageout_sample_lim 4 pageout_sample_cnt 4 physmem 128919 pages 503 megabytes -lotsfree 2014 pages 7 megabytes -desfree 1007 pages 3 megabytes -minfree 503 pages 1 megabytes -throttlefree 503 pages 1 megabytes -pageout_reserve 251 pages 0 megabytes -freemem 18685 pages 72 megabytes +lotsfree 4096 pages 16 megabytes +desfree 2048 pages 8 megabytes +minfree 1536 pages 6 megabytes +throttlefree 1536 pages 6 megabytes +pageout_reserve 1152 pages 4 megabytes +freemem 9592 pages 37 megabytes
I also wrote a script to monitor the page scanner. In the 4GB virtual machine, I ran a program that will allocate a specified number of megabytes of memory and repeatedly read and write them in an attempt to keep them resident. You can see freemem fall away and then the scanner kicks in:
---------------- freemem 812257 (3172 MB) ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ---------------- freemem 623318 (2434 MB) ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ---------------- freemem 435509 (1701 MB) ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ---------------- freemem 245787 (960 MB) ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ---------------- freemem 50543 (197 MB) ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 ~~~ SCHEDPAGING; desscan 25 fastscan 134925 pageout_nsec 10000000 cycle, 10536423 ns (5% of max) 0 laps 19688 scanned ~~~ SCHEDPAGING; desscan 19688 fastscan 134925 pageout_nsec 120839143 cycle, 17914686 ns (8% of max) 0 laps 15703 scanned ~~~ SCHEDPAGING; desscan 15703 fastscan 134925 pageout_nsec 98376146 cycle, 17057218 ns (8% of max) 0 laps 15703 scanned ---------------- freemem 8745 (34 MB) ~~~ SCHEDPAGING; desscan 15703 fastscan 134925 pageout_nsec 98376146 cycle, 17609093 ns (8% of max) 0 laps 15703 scanned ~~~ SCHEDPAGING; desscan 15703 fastscan 134925 pageout_nsec 98376146 cycle, 116927891 ns (58% of max) 0 laps 10238 scanned ~~~ SCHEDPAGING; desscan 15703 fastscan 134925 pageout_nsec 98376146 cycle, 37601707 ns (18% of max) 0 laps 15703 scanned ~~~ SCHEDPAGING; desscan 15703 fastscan 134925 pageout_nsec 98376146 cycle, 16310618 ns (8% of max) 0 laps 15703 scanned ---------------- freemem 8745 (34 MB) ~~~ SCHEDPAGING; desscan 15703 fastscan 134925 pageout_nsec 98376146 cycle, 63880666 ns (31% of max) 0 laps 15703 scanned ~~~ SCHEDPAGING; desscan 15703 fastscan 134925 pageout_nsec 98376146 cycle, 19270634 ns (9% of max) 0 laps 15700 scanned ~~~ SCHEDPAGING; desscan 15700 fastscan 134925 pageout_nsec 98364525 cycle, 96500577 ns (48% of max) 0 laps 15700 scanned ~~~ SCHEDPAGING; desscan 15700 fastscan 134925 pageout_nsec 98364525 cycle, 22501728 ns (11% of max) 0 laps 15700 scanned ---------------- freemem 8746 (34 MB) ~~~ SCHEDPAGING; desscan 15700 fastscan 134925 pageout_nsec 98364525 cycle, 99393584 ns (49% of max) 0 laps 10238 scanned ~~~ SCHEDPAGING; desscan 15700 fastscan 134925 pageout_nsec 98364525 cycle, 28422239 ns (14% of max) 0 laps 15700 scanned ~~~ SCHEDPAGING; desscan 15700 fastscan 134925 pageout_nsec 98364525 cycle, 19333560 ns (9% of max) 0 laps 15700 scanned ~~~ SCHEDPAGING; desscan 15700 fastscan 134925 pageout_nsec 98364525 cycle, 26420012 ns (13% of max) 0 laps 15700 scanned ---------------- freemem 9831 (38 MB) ~~~ SCHEDPAGING; desscan 15700 fastscan 134925 pageout_nsec 98364525 cycle, 13944024 ns (6% of max) 0 laps 13464 scanned ~~~ SCHEDPAGING; desscan 13464 fastscan 134925 pageout_nsec 85755963 cycle, 18586381 ns (9% of max) 0 laps 13464 scanned ~~~ SCHEDPAGING; desscan 13464 fastscan 134925 pageout_nsec 85755963 cycle, 85944003 ns (42% of max) 0 laps 13309 scanned ~~~ SCHEDPAGING; desscan 13464 fastscan 134925 pageout_nsec 85755963 cycle, 7476417 ns (3% of max) 0 laps 13464 scanned ---------------- freemem 9831 (38 MB) ~~~ SCHEDPAGING; desscan 13464 fastscan 134925 pageout_nsec 85755963 cycle, 104926380 ns (52% of max) 0 laps 8189 scanned ~~~ SCHEDPAGING; desscan 13464 fastscan 134925 pageout_nsec 85755963 cycle, 23908288 ns (11% of max) 0 laps 13464 scanned ~~~ SCHEDPAGING; desscan 13464 fastscan 134925 pageout_nsec 85755963 cycle, 23343399 ns (11% of max) 0 laps 13464 scanned ~~~ SCHEDPAGING; desscan 13464 fastscan 134925 pageout_nsec 85755963 cycle, 15711620 ns (7% of max) 0 laps 11744 scanned ---------------- freemem 10665 (41 MB) ~~~ SCHEDPAGING; desscan 11744 fastscan 134925 pageout_nsec 76064220 cycle, 94039807 ns (47% of max) 0 laps 11263 scanned ~~~ SCHEDPAGING; desscan 11744 fastscan 134925 pageout_nsec 76064220 cycle, 24910059 ns (12% of max) 0 laps 11744 scanned ~~~ SCHEDPAGING; desscan 11744 fastscan 134925 pageout_nsec 76064220 cycle, 15823486 ns (7% of max) 0 laps 11744 scanned ~~~ SCHEDPAGING; desscan 11744 fastscan 134925 pageout_nsec 76064220 cycle, 15632502 ns (7% of max) 0 laps 11705 scanned ---------------- freemem 10686 (41 MB) ~~~ SCHEDPAGING; desscan 11705 fastscan 134925 pageout_nsec 75843425 cycle, 65525953 ns (32% of max) 0 laps 11701 scanned ~~~ SCHEDPAGING; desscan 11701 fastscan 134925 pageout_nsec 75820183 cycle, 21883678 ns (10% of max) 0 laps 11699 scanned ~~~ SCHEDPAGING; desscan 11699 fastscan 134925 pageout_nsec 75808562 cycle, 15060435 ns (7% of max) 0 laps 11511 scanned ~~~ SCHEDPAGING; desscan 11511 fastscan 134925 pageout_nsec 74751070 cycle, 15139263 ns (7% of max) 0 laps 11511 scanned ---------------- freemem 10778 (42 MB) ~~~ SCHEDPAGING; desscan 11511 fastscan 134925 pageout_nsec 74751070 cycle, 87005340 ns (43% of max) 0 laps 5119 scanned ~~~ SCHEDPAGING; desscan 11511 fastscan 134925 pageout_nsec 74751070 cycle, 14926752 ns (7% of max) 0 laps 11511 scanned ~~~ SCHEDPAGING; desscan 11511 fastscan 134925 pageout_nsec 74751070 cycle, 78493933 ns (39% of max) 0 laps 5119 scanned ~~~ SCHEDPAGING; desscan 11505 fastscan 134925 pageout_nsec 74716207 cycle, 15051959 ns (7% of max) 0 laps 11505 scanned ---------------- freemem 10781 (42 MB) ~~~ SCHEDPAGING; desscan 11505 fastscan 134925 pageout_nsec 74716207 cycle, 15107000 ns (7% of max) 0 laps 11505 scanned ~~~ SCHEDPAGING; desscan 11505 fastscan 134925 pageout_nsec 74716207 cycle, 14875161 ns (7% of max) 0 laps 11505 scanned ~~~ SCHEDPAGING; desscan 11505 fastscan 134925 pageout_nsec 74716207 cycle, 84702829 ns (42% of max) 0 laps 6136 scanned ~~~ SCHEDPAGING; desscan 11503 fastscan 134925 pageout_nsec 74704587 cycle, 15585797 ns (7% of max) 0 laps 11503 scanned ---------------- freemem 10782 (42 MB) ~~~ SCHEDPAGING; desscan 11503 fastscan 134925 pageout_nsec 74704587 cycle, 86954202 ns (43% of max) 0 laps 9207 scanned ~~~ SCHEDPAGING; desscan 11503 fastscan 134925 pageout_nsec 74704587 cycle, 10146716 ns (5% of max) 0 laps 11499 scanned ~~~ SCHEDPAGING; desscan 11499 fastscan 134925 pageout_nsec 74681345 cycle, 36940001 ns (18% of max) 0 laps 11396 scanned ~~~ SCHEDPAGING; desscan 11396 fastscan 134925 pageout_nsec 74100305 cycle, 13753410 ns (6% of max) 0 laps 12080 scanned ---------------- freemem 10063 (39 MB) ~~~ SCHEDPAGING; desscan 12080 fastscan 134925 pageout_nsec 77958409 cycle, 47155417 ns (23% of max) 0 laps 12985 scanned ~~~ SCHEDPAGING; desscan 12985 fastscan 134925 pageout_nsec 83059938 cycle, 13984462 ns (6% of max) 0 laps 13616 scanned ~~~ SCHEDPAGING; desscan 13616 fastscan 134925 pageout_nsec 86615902 cycle, 13435323 ns (6% of max) 0 laps 14315 scanned ~~~ SCHEDPAGING; desscan 14315 fastscan 134925 pageout_nsec 90555351 cycle, 12988078 ns (6% of max) 0 laps 18137 scanned ---------------- freemem 7382 (28 MB) ~~~ SCHEDPAGING; desscan 18137 fastscan 134925 pageout_nsec 112100305 cycle, 12479328 ns (6% of max) 0 laps 18527 scanned ~~~ SCHEDPAGING; desscan 18527 fastscan 134925 pageout_nsec 114296636 cycle, 39049705 ns (19% of max) 0 laps 18714 scanned ~~~ SCHEDPAGING; desscan 18714 fastscan 134925 pageout_nsec 115354128 cycle, 13144154 ns (6% of max) 0 laps 10349 scanned ~~~ SCHEDPAGING; desscan 10349 fastscan 134925 pageout_nsec 68196941 cycle, 14842170 ns (7% of max) 0 laps 10357 scanned ---------------- freemem 11016 (43 MB) ~~~ SCHEDPAGING; desscan 10357 fastscan 134925 pageout_nsec 68243425 cycle, 69425172 ns (34% of max) 0 laps 11023 scanned ~~~ SCHEDPAGING; desscan 11023 fastscan 134925 pageout_nsec 71996941 cycle, 17262974 ns (8% of max) 0 laps 13734 scanned ~~~ SCHEDPAGING; desscan 13734 fastscan 134925 pageout_nsec 87278287 cycle, 86739555 ns (43% of max) 0 laps 15160 scanned ~~~ SCHEDPAGING; desscan 15160 fastscan 134925 pageout_nsec 95319877 cycle, 15065196 ns (7% of max) 0 laps 15892 scanned ---------------- freemem 8717 (34 MB) ~~~ SCHEDPAGING; desscan 15892 fastscan 134925 pageout_nsec 99445259 cycle, 97486181 ns (48% of max) 0 laps 15760 scanned ~~~ SCHEDPAGING; desscan 15760 fastscan 134925 pageout_nsec 98701529 cycle, 19923485 ns (9% of max) 0 laps 15946 scanned ~~~ SCHEDPAGING; desscan 15946 fastscan 134925 pageout_nsec 99747400 cycle, 17100867 ns (8% of max) 0 laps 16760 scanned ~~~ SCHEDPAGING; desscan 16760 fastscan 134925 pageout_nsec 104337614 cycle, 19420313 ns (9% of max) 0 laps 17457 scanned ---------------- freemem 7529 (29 MB) ~~~ SCHEDPAGING; desscan 17457 fastscan 134925 pageout_nsec 108265443 cycle, 102923784 ns (51% of max) 0 laps 18209 scanned ~~~ SCHEDPAGING; desscan 18209 fastscan 134925 pageout_nsec 112507033 cycle, 24365923 ns (12% of max) 0 laps 18418 scanned ~~~ SCHEDPAGING; desscan 18418 fastscan 134925 pageout_nsec 113680733 cycle, 16886613 ns (8% of max) 0 laps 18762 scanned ~~~ SCHEDPAGING; desscan 18762 fastscan 134925 pageout_nsec 115621406 cycle, 22114961 ns (11% of max) 0 laps 19228 scanned ...
The target duty cycle bounces around with freemem. I was not able to push the memory situation harder, because until at least #13092 and #13093 are integrated it's still very easy to cork the system when you have low freemem and high paging activity.
Updated by Joshua M. Clulow about 2 years ago
- File collect_new.sh collect_new.sh added
- File collect_old.sh collect_old.sh added
- File watch_pageout.d watch_pageout.d added
I've attached the scripts I used to collect tunables before and after the change, and for basic monitoring of the scanner.
Updated by Joshua M. Clulow about 2 years ago
- Related to Bug #13107: pageout should use hrtime instead of lbolt added
Updated by Electric Monk about 2 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit 2d9166aee5315081107056f3d663e4adee4c1c2a
commit 2d9166aee5315081107056f3d663e4adee4c1c2a Author: Joshua M. Clulow <josh@sysmgr.org> Date: 2021-02-03T00:41:05.000Z 13097 improve VM tunables for modern systems 13107 pageout should use hrtime instead of lbolt Reviewed by: Robert Mustacchi <rm@fingolfin.org> Reviewed by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org> Approved by: Richard Lowe <richlowe@richlowe.net>
Updated by Marcel Telka almost 2 years ago
- Related to Bug #13499: System paging parameters no longer calculated at boot after 13097 added