Project

General

Profile

Bug #1810

System call bursts in 'umount2' on an idle system up to 4 million/minute

Added by Jon Strabala over 7 years ago. Updated over 7 years ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2011-11-22
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

When looking at some odd CPU use issues in 'fflush' and 'zpool-rool' I found that occasinally I have large bursts of system calls to the like of 'umount2' on a fully idle system.

I have three BE's as follows:

lab10# beadm list
BE                 Active Mountpoint Space Policy Created
oi_151a            -      -          33.5M static 2011-09-14 09:01
tsc-fix            -      -          826M  static 2011-10-28 15:08
nightly-2011-11-18 NR     /          64.3G static 2011-11-18 08:18

I run a script called 'cpu.d' which samples for 60 seconds and I do this 6 times.

BE oi_151a (a 148b upgrade to 1512a) had 2 out of 6 cpu.d 
runs showing 'umount2' system calls > 1.8M
refer to FILE chk_fflush.20111123.oi_151a.out:  
umount2                                                     1863075 
umount2                                                     1866598

BE tsc-fix (built by Rich Lowe) had 2 out of 6 cpu.d 
runs showing 'umount2' system calls > 2.6M
refer to FILE chk_fflush.20111122.deadline-tsc-0-gf0ee1a4.out:  
umount2                                                     2693302 
umount2                                                     2870902

BE nightly-2011-11-18 had 2 out of 6 cpu.d runs 
showing 'umount2' system calls > 3.7M
refer to FILE chk_fflush.20111122.tsc-ga-20111118.out:  

umount2                                                     3946642  
umount2                                                     3764694
umount2                                                     3759205

If the definition of '_umount2_' is as follows from the '_man_' page, I can not for the life of me figure out why millions of attempts each minute requesting that a previously mounted file system contained on a block special device or directory be unmounted on an idle server (as high as 3,946,642 in 60 seconds or 65,777/sec.)

     The  umount()   function  requests that a previously mounted
     file system contained on a block special device or directory
     be  unmounted.   The file argument is a pointer to the abso
     lute pathname of the file  system  to  be  unmounted.  After
     unmounting  the  file  system,  the directory upon which the
     file system was mounted reverts to its ordinary  interpreta
     tion.

     The  umount2()   function is identical to umount(), with the
     additional capability of unmounting  file  systems  even  if
     there are open files active. The mflag argument must contain
     one of the following values:

The fact that this "idle" system is executing the system call 'umount2' at a rate of 65,777 times a second (maybe higher as this is averaged over a minute) occasionally seems rather high.

The script "cpu.d" script - note I run it six times (for every BE) and sometimes I don't see 'umount2' at all but at least 1/3 of the time I do. Sometimes I see millions so the issue shows itself in an occasional burst (sometimes almost hitting 4 million system calls / minute).

#!/usr/sbin/dtrace -s
syscall:::entry {
        self->vstart = vtimestamp;
}

syscall:::return /self->vstart/ {
        this->oncpu = vtimestamp - self->vstart;
        @[probefunc] = avg(this->oncpu);
        self->vstart = 0;
}

tick-60s {
        exit(0);
}

Here is one example 'cpu.d' output, refer to the files for an extensive amount of system info and other dtrace runs.

lab10# ./cpu.d
dtrace: script './cpu.d' matched 465 probes
CPU     ID                    FUNCTION:NAME
  1  65595                        :tick-60s

  p_online                                                         82
  getpid                                                          152
  sysconfig                                                       168
  sigaction                                                       363
  gtime                                                           398
  lwp_sigmask                                                     453
  fcntl                                                           553
  pset                                                            587
  clock_gettime                                                   742
  uucopy                                                          829
  lwp_continue                                                    960
  alarm                                                          1186
  ioctl                                                          1226
  c2audit                                                        1326
  lwp_kill                                                       1423
  setcontext                                                     1615
  brk                                                            2169
  getmsg                                                         2267
  mmap                                                           3140
  stat64                                                         3308
  doorfs                                                         3474
  read                                                           5213
  pread                                                          5999
  close                                                          7294
  write                                                          9020
  statvfs64                                                      9351
  lwp_park                                                      10086
  schedctl                                                      10100
  stat                                                          13764
  writev                                                        14845
  pollsys                                                       17994
  open                                                          25440
  lwp_create                                                    27340
  so_socket                                                     37327
  nanosleep                                                     57335
  putmsg                                                       297994
  umount2                                                     3946642

The attached file 'scripts.chk_fflush.tar' contains several of dtrace scripts that I used to make the three attached output files *.out:

iotop diskhits iopending hotspot.d iofile.d iofileb.d 
seeksize.d bitesize.d fsflush.d fspaging.d psio10 iosnoop 
hotkernel_60 cpu.d zfssnoop.d ziosnoop.d zfsstack.d


Files

chk_fflush.20111123.oi_151a.out.gz (83.8 KB) chk_fflush.20111123.oi_151a.out.gz output for BE oi_151a Jon Strabala, 2011-11-23 07:22 PM
chk_fflush.20111122.deadline-tsc-0-gf0ee1a4.out.gz (126 KB) chk_fflush.20111122.deadline-tsc-0-gf0ee1a4.out.gz output for BE tsc-fix Jon Strabala, 2011-11-23 07:22 PM
chk_fflush.20111122.tsc-ga-20111118.out.gz (121 KB) chk_fflush.20111122.tsc-ga-20111118.out.gz output for BE nightly-2011-11-1 Jon Strabala, 2011-11-23 07:22 PM
scripts.chk_fflush.tar (90 KB) scripts.chk_fflush.tar scripts used to make above output files Jon Strabala, 2011-11-23 07:22 PM
chk_fflush.20111128.tsc-ga-20111118.out.gz (108 KB) chk_fflush.20111128.tsc-ga-20111118.out.gz output for BE nightly-2011-11-18 after five days Jon Strabala, 2011-11-28 08:08 PM
20111128_tcs_ga_three_dtrace_oneliners_at_180s_and_10ms.txt (920 KB) 20111128_tcs_ga_three_dtrace_oneliners_at_180s_and_10ms.txt Jon Strabala, 2011-11-29 05:24 PM
chk_fsflush_automountd_zpool-rpool.out.txt (366 KB) chk_fsflush_automountd_zpool-rpool.out.txt Jon Strabala, 2011-11-29 05:26 PM

History

#1

Updated by Jon Strabala over 7 years ago

After five days of uptime on BE nightly-2011-11-18, I re-ran the same dtrace scripts again as I did on 11/23/2011.

BE nightly-2011-11-18 had 3 out of 6 cpu.d runs (60 second samples)
showing 'umount2' system calls > 3.5M
refer to FILE chk_fflush.20111128.tsc-ga-20111118.out:  

  umount2                                                     4587575
  umount2                                                     4607936
  umount2                                                     4583946

Also it BE nightly-2011-11-18 had 2 out of 6 cpu.d runs (60 second samples) 
with a very high forksys count > 400K

  forksys                                                      408747
  forksys                                                      471988

This "idle" system is still executing the system call 'umount2' at a rate up to 76,798 times a second (maybe even higher as this is averaged over a minute).

#2

Updated by Rich Lowe over 7 years ago

add execname to the aggregation too, and we'll see who's doing it

#3

Updated by Jon Strabala over 7 years ago

Rich,

I see two potential items the first is 'automountd' this is the program calling 'umount2'

The second is 'sendmail' this is the program calling 'forksys', I had some issues turning off 'sendmail'. Yes in an IRC chat I inidcated that I did a "svcadm disable sendmail" but keep getting about 600 messages [time period is fuzzy] to /var/log/syslog oi_151a like:

Oct  3 23:53:40 lab10 sendmail[2179]: [ID 801593 mail.info] p92M8das001995: 
to=postmaster, delay=1+08:45:01, xdelay=00:00:00, mailer=relay, pri=11858122, 
relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1]

Also mailq indicated "/var/spool/mqueue was empty" and it seems like /usr/lib/update-manager/update-refresh.sh via cron produces some mail and it also seems like some messages are generated when I build OI/illumos. This was triggering the syslog messages. Thus I kind of turned partially turned 'sendmail' back on.

This is a lab box I really don't want any 'sendmail' at all. Perhaps my inability to turn sendmail completely off (and my stubbornness to not fully configure sendmail) is the culprit for both the 'forksys' and the 'umount2' issues ?

Still 15,831,553 calls (yeas 15M) from 'automountd' to 'umount2' in 60 seconds for any reason seems insane - is my dtrace script (below) bad ?

#!/usr/sbin/dtrace -s

syscall:::entry
{
        self->vstart = vtimestamp;
}

syscall:::return
/self->vstart/
{
        this->oncpu = vtimestamp - self->vstart;
        @[probefunc,execname] = avg(this->oncpu);
        self->vstart = 0;
}

tick-60s
{
        exit(0);
}

And two outputs one a 60 second run catching 'umount2' and the other a 60 second run catching 'forksys'

# -----------------------------------------------------------------------------
# script output/run for ./cpu_en.d
# ----------------------------
lab10# ./cpu_en.d
dtrace: script './cpu_en.d' matched 465 probes
CPU     ID                    FUNCTION:NAME
  0  73792                        :tick-60s

  p_online                                            dtrace                                                           76
  getpid                                              nscd                                                            155
  sysconfig                                           dtrace                                                          168
  fcntl                                               nscd                                                            245
  getpid                                              svc.configd                                                     250
  gtime                                               sendmail                                                        266
  gtime                                               utmpd                                                           279
  gtime                                               nscd                                                            345
  sigaction                                           dtrace                                                          360
  gtime                                               devfsadm                                                        364
  lwp_sigmask                                         sendmail                                                        394
  lwp_sigmask                                         nscd                                                            435
  gtime                                               intrd                                                           436
  gtime                                               automountd                                                      533
  clock_gettime                                       hald-addon-acpi                                                 547
  uucopy                                              nscd                                                            644
  fstat64                                             init                                                            679
  setcontext                                          nscd                                                            685
  lwp_continue                                        nscd                                                            702
  llseek                                              init                                                            706
  fcntl                                               svc.configd                                                     718
  pset                                                sendmail                                                        755
  ioctl                                               utmpd                                                           825
  ioctl                                               dtrace                                                          937
  llseek                                              utmpd                                                           990
  ioctl                                               init                                                           1033
  close                                               init                                                           1137
  llseek                                              nscd                                                           1173
  fstat64                                             utmpd                                                          1189
  ioctl                                               nscd                                                           1325
  c2audit                                             svc.configd                                                    1339
  lwp_kill                                            nscd                                                           1430
  close                                               utmpd                                                          1468
  fstat64                                             nscd                                                           1492
  close                                               nscd                                                           1554
  schedctl                                            nscd                                                           1604
  brk                                                 dtrace                                                         1715
  doorfs                                              svc.configd                                                    2511
  stat64                                              nscd                                                           2827
  read                                                hald-addon-acpi                                                2885
  doorfs                                              nscd                                                           3207
  kill                                                utmpd                                                          3330
  read                                                hald                                                           3591
  lwp_park                                            fmd                                                            3855
  pollsys                                             hald                                                           5660
  doorfs                                              automountd                                                     5894
  read                                                utmpd                                                          5907
  read                                                init                                                           6293
  nanosleep                                           intrd                                                          6912
  open                                                init                                                           7118
  lwp_park                                            devfsadm                                                       7357
  lwp_park                                            svc.configd                                                    7734
  nanosleep                                           fmd                                                            7910
  stat                                                init                                                           7916
  lwp_park                                            dtrace                                                         8037
  writev                                              hald                                                           8040
  write                                               init                                                           8573
  nanosleep                                           nscd                                                           8600
  lwp_park                                            nscd                                                           8635
  lwp_park                                            svc.startd                                                     9285
  pollsys                                             hald-addon-acpi                                                9550
  pollsys                                             sendmail                                                       9687
  pollsys                                             in.routed                                                      9918
  pollsys                                             fmd                                                           10118
  rename                                              init                                                          10544
  ioctl                                               intrd                                                         10856
  schedctl                                            dtrace                                                        11397
  open                                                nscd                                                          11443
  read                                                nscd                                                          12101
  stat                                                nscd                                                          12616
  open                                                utmpd                                                         13174
  lwp_create                                          nscd                                                          16285
  pread                                               utmpd                                                         21000
  writev                                              hald-addon-acpi                                               22861
  mmap                                                dtrace                                                        47378
  umount2                                             automountd                                                 15831553

# -----------------------------------------------------------------------------
# script output/run for ./cpu_en.d
# ----------------------------
lab10# ./cpu_en.d
dtrace: script './cpu_en.d' matched 465 probes
CPU     ID                    FUNCTION:NAME
  0  73792                        :tick-60s

  p_online                                            dtrace                                                           76
  sysconfig                                           dtrace                                                          181
  gtime                                               nscd                                                            239
  lwp_sigmask                                         ttymon                                                          261
  getpid                                              sendmail                                                        289
  gtime                                               sendmail                                                        289
  lwp_self                                            sendmail                                                        315
  lwp_sigmask                                         sac                                                             323
  lwp_sigmask                                         nscd                                                            340
  lwp_sigmask                                         sendmail                                                        360
  sigaction                                           dtrace                                                          374
  gtime                                               devfsadm                                                        388
  gtime                                               intrd                                                           395
  getuid                                              sendmail                                                        459
  clock_gettime                                       hald-addon-acpi                                                 595
  setcontext                                          nscd                                                            616
  uucopy                                              nscd                                                            633
  lwp_continue                                        nscd                                                            739
  sigaction                                           sendmail                                                        745
  pset                                                sendmail                                                        784
  getmsg                                              in.routed                                                       938
  ioctl                                               dtrace                                                          983
  fcntl                                               sendmail                                                       1001
  setitimer                                           sendmail                                                       1205
  alarm                                               sac                                                            1208
  lwp_kill                                            nscd                                                           1219
  schedctl                                            nscd                                                           1399
  fstat64                                             sendmail                                                       1478
  read                                                ttymon                                                         1490
  close                                               sendmail                                                       1601
  brk                                                 dtrace                                                         1739
  stat64                                              nscd                                                           2068
  setcontext                                          ttymon                                                         2576
  setcontext                                          sendmail                                                       2606
  setuid                                              sendmail                                                       2776
  read                                                hald-addon-acpi                                                2958
  setcontext                                          sac                                                            3420
  waitsys                                             sendmail                                                       3446
  lwp_park                                            fmd                                                            3657
  read                                                hald                                                           3814
  pollsys                                             ttymon                                                         4375
  write                                               ttymon                                                         4546
  openat                                              sendmail                                                       4984
  pollsys                                             hald                                                           5804
  close                                               in.routed                                                      6449
  lwp_park                                            svc.configd                                                    7729
  lwp_park                                            devfsadm                                                       7776
  nanosleep                                           fmd                                                            8081
  lwp_park                                            dtrace                                                         8133
  writev                                              hald                                                           8155
  pause                                               sendmail                                                       8254
  lwp_park                                            nscd                                                           8628
  read                                                sac                                                            8705
  pollsys                                             sendmail                                                       8972
  write                                               sac                                                            9046
  lwp_park                                            svc.startd                                                     9088
  pollsys                                             in.routed                                                      9636
  getdents64                                          sendmail                                                       9837
  pollsys                                             hald-addon-acpi                                                9839
  stat                                                nscd                                                          10616
  schedctl                                            dtrace                                                        11100
  writev                                              hald-addon-acpi                                               11225
  so_socket                                           in.routed                                                     11600
  pollsys                                             fmd                                                           11660
  nanosleep                                           nscd                                                          12605
  lwp_create                                          nscd                                                          16906
  schedctl                                            sendmail                                                      19272
  ioctl                                               intrd                                                         22788
  pread                                               utmpd                                                         23557
  open                                                in.routed                                                     26906
  open64                                              sendmail                                                      32919
  ioctl                                               in.routed                                                     45250
  mmap                                                dtrace                                                        46774
  putmsg                                              in.routed                                                    189065
  forksys                                             sendmail                                                     233419

#4

Updated by Rich Lowe over 7 years ago

I'm pretty sure the automounter shouldn't be attempting unmounts at anything like that rate, unless you have a lot of automounted filesystems, and most of them are mounted.

You could adjust the D to record the paths it was trying to unmount, and at what rate (a lot of paths at a normal rate? a few paths far too often?)

#5

Updated by Rich Lowe over 7 years ago

Have you done anything unusual to this system (changing the value of hz, or anything of that nature?)

Also, can you record both the user and kernel stack when mount2 is running (ustack(), and stack())

#6

Updated by Rich Lowe over 7 years ago

  • Status changed from New to Feedback

Ok, no, now I see it. I hadn't read your scripts, just your text.

What your scripts are measuring isn't what you're describing them measuring (or you're confusing which script is which, and thus confusing me).

You're measuring their average time to complete, in nsec, but describing it as calls/sec. So what you're really seeing is an average time for the unmount() system call to complete of about 4 thousandths of a second.

#7

Updated by Jon Strabala over 7 years ago

Sorry about misinterpreting the dtrace output (perils of someone like me, not understanding 'dtrace', blindly grabbing a analysis script and using it), I agree 4ms in umount() isn't too bad after all it does do something physical.

Before I let this one go, I dug a bit more - maybe the title should be 'fsflush' activity instead of 'umount2' - but maybe this is just normal kernel/zfs type activity.

So the below 'one-liner' dtrace script, using '= count();' seems to say that 'fsflush' is called 4,500,931 (4.5M times) in 180 seconds right (that would be about 25K/sec.,) - would this be considered high on an idles system doing absolutely nothing ?

lab10# dtrace -n 'fbt::*:entry{ @[execname] = count(); } tick-180s { exit(0); }'
dtrace: description 'fbt::*:entry' matched 31473 probes
CPU     ID                    FUNCTION:NAME
  1  65604                       :tick-180s

  ttymon                                                          616
  sac                                                             635
  automountd                                                      720
  in.routed                                                      1344
  kcfpoold                                                       1511
  svc.startd                                                     3659
  utmpd                                                          3839
  hald-addon-acpi                                                8625
  hald                                                          10952
  svc.configd                                                   18715
  devfsadm                                                      19726
  nscd                                                          29862
  intrd                                                         35188
  fmd                                                           68451
  zpool-rpool                                                  204841
  dtrace                                                      2434749
  fsflush                                                     4500931
  sched                                                      17339936

More details in "20111128_tcs_ga_three_dtrace_oneliners_at_180s_and_10ms.txt" the attached file.

In an attempt to dig deeper I made another dtrace script called "chk_fsflush_automountd_zpool-rpool.d" to print some stats and specifically look at the fsflush, automountd and zpool-rpool processes. The actual script is shown below (all mistakes are mine, for example the output in ms for sched never looks right):

#!/usr/sbin/dtrace -s

#pragma D option quiet

sched:::on-cpu
{
        self->on = vtimestamp;
}
sched:::off-cpu
/self->on/
{
        @oncpu[execname] = sum(vtimestamp - self->on);
        self->on = 0;
}
vminfo:::anonpgin
{
        self->anonpgin = 1;
}
:::pageio_setup:return
{
        self->wait = timestamp;
}
:::pageio_done:entry
/self->anonpgin == 1/
{
        self->anonpgin = 0;
        @pageintime[execname] = sum(timestamp - self->wait);
        self->wait = 0;
}
::::entry
/execname == "fsflush" || execname == "zpool-rpool" || execname == "automountd"/
{
        @exec_counts[probefunc,execname] = count();
}

tick-5s
{
        trunc(@oncpu, 10);
        trunc(@pageintime, 10);
        printf("\033[H\033[2J");
        normalize(@pageintime, 1000000);
        printf("Who's waiting for pagein (milliseconds):\n");
        printa("  %-50s %15@d\n", @pageintime);
        normalize(@oncpu, 1000000);
        printf("Who's on cpu (milliseconds):\n");
        printa("  %-50s %15@d\n", @oncpu);
        printf("Dump - fsflush zpool-rpool automountd count()'s\n");
        printf("  probefunc,                               execname,                                           count()\n");
        printf("  ----------------------------------------------------------------------------------------------------\n");
        printa("  %-40s %-40s %15@d\n", @exec_counts);
        trunc(@oncpu);
        trunc(@pageintime);
        trunc(@exec_counts);
}

tick-180s
{
        exit(0);
}

I ran the above on an 'idle' system 'cut-n-pasted' the output to the attached file "chk_fsflush_automountd_zpool-rpool.out"

The most interesting item is that sometimes in a five (5) second period (there are 36 periods) I see some spikes like:

  probefunc,                               execname,                                           count()
  ----------------------------------------------------------------------------------------------------
    **** many lines removed - see chk_fsflush_automountd_zpool-rpool.out for full output ***
  page_trylock                             fsflush                                             5037
  page_unlock                              fsflush                                             5037
  page_pptonum                             fsflush                                            20633
  page_nextn                               fsflush                                           634781

This becomes almost 127K calls/sec to 'page_nextn' (from the fsflush process), but once again the system is doing absolutely nothing and all output is to the screen via an ssh login to putty runing on a PC (screen buffer is on the PC). Remember this is 127K calls/sec in a 5 second sample, the true peak/burst may actually be much higher.

Is this activity 'page_nextn' (from fsflush) normal on a system doing essentially nothing ? If it is close the ticket.

#8

Updated by Jon Strabala over 7 years ago

Added file "chk_fsflush_automountd_zpool-rpool.out.txt" referenced in https://www.illumos.org/issues/1810#note-7

#9

Updated by Jon Strabala over 7 years ago

It seems like fsflush is indeed a process that should consume a bit of CPU as it need to scan memory. I don't know if the tuning 'advice' below is accurate anymore (for illumos) but it does seem to explain what is happening WRT fsflush calls.

From: http://oracle-dba-quickanswers.blogspot.com/2011/10/how-to-analyze-high-cpu-utilization-in.html

Fsflush Cpu Usage:

Primary task of file system flush daemon, fsflush, is to search and flush dirty 
file system pages to the backing store of the file system that the file(s) reside 
in. By default, fsflush is configured to wake up every 5 seconds (in recent releases 
it is changed to 1 second) and search 1/6th of physical memory for dirty pages 
- thus flushing all of physical memory within 30 seconds.

On large memory systems, the fsflush daemon will consume a large amount of cpu 
if there are many dirty pages in the file system page cache. In some circumstances,
the daemon will not be able to complete it's scan of 1/6th of physical memory at 
each scheduled wake up. 

The amount of time being spent by fsflush can be observed by taking multiple 
snapshots using ps(1) to view the cpu time (TIME column) consumed by the daemon:

# ps -aef |grep fsflush
UID PID PPID C STIME TTY TIME CMD
root 3 0 0 10:14:13 ? 132:13 fsflush

The amount of time spent in fsflush can also be observed on an idle machine. 
A 96GB machine with many file system pages will use 100% of a CPU for 
approximately 3 seconds, each time the file system daemon wakes up:

# mpstat 5
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 4 0 109 201 101 15 0 0 12 0 64 0 99* 0 1
1 39 0 4 1 1 6 0 0 3 0 92 0 0 0 100
2 15 0 1473 10 10 14 0 0 1 0 21 2 0 2 95
3 21 0 1726 1 1 56 0 0 47 0 286 1 0 3 96
4 22 2 123 0 0 11 0 0 1 0 68 0 0 1 98

A larger value for the autoup parameter reduces the fsflush cpu usage at a cost 
of keeping the dirty buffer in memory for longer period . If fsflush runs for a 
longer period, it can effect application performance since it runs in kernel mode. 
It is therefore recommended to increase the default (30 seconds) to (nGB x 4). 
For example, a system configured with 96 GB should have an autoup value of 384.
Edit the /etc/system file and add the following line and reboot. 
set autoup=384

Now the system under test, hostname: lab10, (16GB ram, last reboot about 17 hours ago) is usually always like the following (nothing near as high as the 'mpstat 5' output shown in the URL above):

lab10# ps -aef |grep fsflush | grep -v grep
    root      3      0   0 17:36:04 ?           1:37 fsflush

lab10# mpstat 5
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    7   0   20   344  102   91    0    3    1    0    16    0   0   0 100
  1    9   0   38   104   24   72    0    3    1    0    23    0   0   0 100
  2    7   0   34   104   19   71    0    3    1    0    16    0   0   0 100
  3    7   0   22   109   17   75    0    3    1    0    17    0   0   0 100
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0   312  101   89    0    4    0    0     4    0   0   0 100
  1    0   0    0    66   20   59    0    3    1    0     0    0   0   0 100
  2    0   0    0    74   28   68    0    3    1    0    56    0   0   0 100
  3    0   0    0    70    9   66    0    4    1    0     1    0   0   0 100
         *
         *
         *

but occasionally (1 out of 20 samples) I do see a bit of non-idle CPU like:

lab10# mpstat 5
         *
         *
         *
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0   312  101   68    0    5    1    0     5    0   0   0 100
  1    0   0    0    88   37   87    0    4    0    0     1    0   1   0  99
  2    0   0    0    66   11   57    0    4    1    0    57    0   0   0 100
  3    0   0    0    82   15   75    0    5    1    0     1    0   0   0 100

Further information on fsflush and tunables can be found starting on page 35, http://www.scribd.com/doc/59153317/Solaris-10-System-Administrator-Collection-Solaris-Tunable-Parameters-Reference-Manual - but once again I don't know how relevant this is for illumos.

At this point the burst of CPU related to 'fsflush' is just a curiosity to me as the above information (refer to the URL's) kind of explains what is going on to me.

So the only question that remains, is this activity 'page_nextn' (from fsflush) normal on a system doing essentially nothing ? Should pages just be flushed once. Or does all the kernel or perhaps even dtrace activity guarantee that I will be getting a spikes of 'fsflush' activity because pages are getting dirty. It still seems odd that on an idle system with 16GB RAM, (top only shows 2GB used for anything - I assume mostly for ZFS read cache - with 100% of the swap free) that I would have to write anything to disk.

lab10# #  top -q -S -o time

last pid: 102042;  load avg  0.00,  0.00,  0.00;  up 0+17:17:48                 10:53:21
49 processes: 48 sleeping, 1 on cpu
CPU states:  100% idle,  0.0% user,  0.1% kernel,  0.0% iowait,  0.0% swap
Kernel: 287 ctxsw, 530 intr, 44 syscall
Memory: 16G phys mem, 14G free mem, 8187M total swap, 8187M free swap

   PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
     3 root        1  60  -20    0K    0K sleep    1:37  0.00% fsflush
     6 root      142  99  -20    0K    0K sleep    0:13  0.00% zpool-rpool
101073 admin       1  59    0 7700K 5116K sleep    0:04  0.00% sshd
100005 root       21  59    0   15M   14M sleep    0:04  0.00% svc.configd
100895 root        1  59    0 6488K 5372K sleep    0:02  0.00% intrd
100686 root        4  59    0 3060K 2012K sleep    0:02  0.00% automountd
100343 root       29  59    0 6040K 4352K sleep    0:01  0.00% nscd
100780 root       25  59    0   31M   25M sleep    0:01  0.00% fmd
100003 root       16  59    0   14M   12M sleep    0:01  0.00% svc.startd
100386 root        4  59    0 6772K 5420K sleep    0:01  0.00% hald
102041 root        1  59  -20 3536K 1912K cpu/2    0:00  0.01% top
100272 root        6  59    0 4396K 3104K sleep    0:00  0.00% devfsadm
     2 root        2  98  -20    0K    0K sleep    0:00  0.00% pageout
100265 root        5  60  -20 2476K 1532K sleep    0:00  0.00% zonestatd
     4 root        5  60  -20    0K    0K sleep    0:00  0.00% kcfpoold
100291 root        1  59    0   11M 5688K sleep    0:00  0.00% cupsd
100885 root        4  59    0 9740K 6600K sleep    0:00  0.00% rad
100784 smmsp       1  59    0 5936K 1660K sleep    0:00  0.00% sendmail
101072 root        1  59    0 5704K 3376K sleep    0:00  0.00% sshd
100241 root       17  59    0 5552K 3044K sleep    0:00  0.00% syseventd
100727 root        3  59    0 4896K 3484K sleep    0:00  0.00% inetd
100836 root        1  59    0 4312K 1920K sleep    0:00  0.00% sshd
100435 root        2  59    0 4256K 3096K sleep    0:00  0.00% console-kit-dae
100760 root       11  59    0 4212K 2204K sleep    0:00  0.00% syslogd
100497 root        1  59    0 4128K 2284K sleep    0:00  0.00% rmvolmgr
100395 root        1  59    0 3888K 2316K sleep    0:00  0.00% hald-addon-cpuf
100393 root        1  59    0 3884K 2312K sleep    0:00  0.00% hald-addon-netw
101092 root        1  59    0 3780K 2648K sleep    0:00  0.00% bash
100387 root        1  59    0 3776K 2260K sleep    0:00  0.00% hald-runner
101080 admin       1  59    0 3772K 2568K sleep    0:00  0.00% bash

In other words if nothing in memory is dirty (needing to be flushed to disk) do I still need to spend the time to 'scan' all the system memory searching for dirty file system pages ? Also if the memory is dedicated to ZFS read cache does it need to be scanned at all ?

If this type of behavior is completely normal - if it is someone can either close or delete the ticket.

Also available in: Atom PDF