Project

General

Profile

Bug #3068

vmstat swap/free wrong in local zones

Added by Henrik Johansson about 8 years ago. Updated about 8 years ago.

Status:
In Progress
Priority:
Normal
Category:
lib - userland libraries
Start date:
2012-08-09
Due date:
2012-08-17
% Done:

50%

Estimated time:
4.00 h
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

vmstat sometimes displays incorrect values when used inside a local zone with 1 second interval and count set:

root@oitest01:~# vmstat 1 3
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr f0 s0 s1 s3   in   sy   cs us sy id
 1 0 0 6085124 3000516 324 4440 0 0 0 0 26  1  1  4 23 1481 3826 1218  4  2 94
 0 0 0      0     0 3048 60544 0 0 0  0  0  0  0  2  4 2053 48958 1875 84 16 0
 6 0 0 4061440 979512 829 31250 0 0 0 0  0  0  0 102 7 1840 26619 2033 90 10 0
root@oitest01:~# vmstat 1 3
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr f0 s0 s1 s3   in   sy   cs us sy id
 1 0 0 6084940 3000328 325 4446 0 0 0 0 26  1  1  4 23 1481 3831 1218  4  2 94
 13 0 0 4024012 951772 1785 52162 0 0 0 0 0 0  0  2  4 1041 60424 712 85 15  0
 0 0 0      0     0 1322 43275 0 0 0  0  0  0 20 20  0 1324 38769 916 89 11  0

This happens if the time between the first and second/third snapshot is too long. A timestamp is taken before acquire snapshot, if it takes too long sleep_until will calculate the sleep to a negative value which will result in a sleep period half the specified.

/illumos-gate/usr/src/cmd/stat/common/common.c:

70     now = gethrtime();
 71     pause = *wakeup + interval - now;
 72 
 73     if (pause <= 0 || pause < (interval / 4))

This can cause two snapshots to be taken before kstat have been updated which happens one per second.

Since the memory counter are accumulative the result in will this case be zero, showing no free swap or memory.

This will not be noticed with intervals of 2s or longer since half of that will always be at least 1s and kstat will have been updated. If no count is give this logic will also be out ignored:

73     if (pause <= 0 || pause < (interval / 4))
74         if (forever || *caught_cont) {
75             /* Reset our cadence (see comment below) */
76             *wakeup = now + interval;
77             pause = interval;

This only happens inside local zones because get_pretty_name calls libdevinfo'di_dim_init which which will end up in libdevinfo'devlink_create that tries to start the devsadmd that can not run inside a zone, it will retry once and sleep between the retries.

/illumos-gate/usr/src/lib/libdevinfo/devinfo_devlink.c:

3267 #define    MAX_DAEMON_ATTEMPTS 2
...
3377 #define    DAEMON_STARTUP_TIME    1 /* 1 second. This may need to be adjusted */
…
3311     } while ((++i < MAX_DAEMON_ATTEMPTS) &&
3312         start_daemon(root, install) == 0);
…
3500 static int
3501 start_daemon(const char *root, int install)
…
3520     (void) sleep(DAEMON_STARTUP_TIME);

Easy to fix but a bit hard to find, I have prepared a patch that modifies devinfo_devlink.c to return without trying to start the daemon inside local zones.

History

#1

Updated by Henrik Johansson about 8 years ago

  • Status changed from New to In Progress

Also available in: Atom PDF