Project

General

Profile

Bug #13567

maxpgio is too low for modern systems

Added by Joshua M. Clulow 19 days ago. Updated 19 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
kernel
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

The VM tuneable maxpgio is set, by default, to 40:

newcastle # mdb -ke 'maxpgio/D'
maxpgio:
maxpgio:        40

It is configured in setupclock() thus:

/*
 * Maxpgio thresholds how much paging is acceptable.
 * This figures that 2/3 busy on an arm is all that is
 * tolerable for paging.  We assume one operation per disk rev.
 *
 * XXX - Does not account for multiple swap devices.
 */
if (clockinit.ci_maxpgio == 0) {
        maxpgio = (DISKRPM * 2) / 3;
} else {
        maxpgio = clockinit.ci_maxpgio;
}

This platform-specific magic constant is defined thus:

/*
 * DISKRPM is used to estimate the number of paging i/o operations
 * which one can expect from a single disk controller.
 *
 * XXX - The system doesn't account for multiple swap devices.
 */
#define DISKRPM         60

The name of this constant is somewhat curious; according to various Internet history the IBM 350 disk storage unit was already spinning at 1200 revolutions per minute (RPM) in the 1960s. If it is "requests per minute", then it is part of a dimensional error: pageout() uses it coarsely as the number of I/O requests per second to make to the swap file.

Setting aside the curious name of the constant, we must note that the maxpgio tuneable is unfortunately not just used within the pageout code but, as mentioned in #3725, also in some fairly random places in the scheduler.

Within pageout, it is used to limit the number of push requests we're willing to make per pageout() wakeup in order to "avoid saturating pageout devices." When pageout() is woken by pageout_scanner(), around 4 times per second under light to moderate memory pressure, it will make at most one quarter of maxpgio worth of push requests, or 10 requests. There are likely exceptions to this under severe memory pressure where pageout() may be woken repeatedly, but in general pageout is limited to about 40 requests per second -- which seems low for modern systems!

It is not clear that a static value, and certainly not one so low as this, is even the right thing to do here. The intent appears to have been to avoid the system becoming permanently busy paging out to disk in what must have been an acutely small memory system by modern standards; handfuls of megabytes at best. That said, if pageout needs to happen to service other allocations and make forward progress, this simple rate limit is probably merely delaying the inevitable disk activity at the expense of the responsiveness of the system.


Related issues

Related to illumos gate - Bug #3725: Tunables shouldn't be used in MIN()/MAX()/nz() macrosIn ProgressMarcel Telka2013-04-18

Actions
#1

Updated by Joshua M. Clulow 19 days ago

  • Related to Bug #3725: Tunables shouldn't be used in MIN()/MAX()/nz() macros added
#2

Updated by Joshua M. Clulow 19 days ago

Note that public documentation for a prominent proprietary fork suggests a default value of 400, rather than the 40 we see ourselves -- suggesting an increase of tenfold in the value of DISKRPM if no other structural changes were made.

The list that pageout_scanner() and pageout() use to track submitted requests is also limited in size, by async_list_size which is fixed at 256, which could also be a separate limiter of pageout throughput.

Also available in: Atom PDF