Project

General

Profile

Bug #9885

zoneadm takes ages to do something on a system with many snapshots

Added by Jim Klimov about 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
lib - userland libraries
Start date:
2018-10-12
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Difficulty:
Medium
Tags:
Gerrit CR:

Description

I was investigating why "zones:default" service failed on my laptop, and found that attempts to boot zones timed out. Eventually I traced the simple "zoneadm" call and I see long streams repeating "ZFS_IOC_SNAPSHOT_LIST_NEXT" and "open("/etc/default/be"...)" and "B E _ H A S _ G R U B = t r u e\n\n" for minutes...

I have auto-snapshots there, or more recently znapzend, so a ZFS tree under the zone root with all the subtree datasets, some zbe versions from OS updates, times the amout of snapshots, has almost 2000 items, which even with an SSD in place takes a while to parse (clocked 6 seconds for a `zfs list`) :\

It seems that it looks at all those datasets and snaps, and goes back to research the "/etc/default/be" file and discard what it sees, I guess... and note it is a LOT of reads of that "be" file which is the same one in the GZ, and there is no such in the local zone. Per discussion on IRC, "it does look for container name there (ROOT)" - so the first optimization matter is if it can't look for the value once, and then iterate all the datasets it needs. Also during the startup phase there are a lot of open and open64 calls repetitively to other files, like locale data and zone index and some others, but that might be initialization of libraries etc. and it has a constrained impact (although it does repeat), just a few hundred actions that could be scrapped compared to tens of thousands of "/etc/default/be" reads.

And also, if it is stuck iterating my ZFS tree while looking for a zoneroot, shouldn't it be interested in just the ZBE filesystem datasets? Why care about all those snaps?

I'll post a trace a bit later.


Files

zoneboot-log.tar.bz2 (509 KB) zoneboot-log.tar.bz2 Jim Klimov, 2018-10-11 03:06 PM
zoneboot-trace-fleaD.tar.bz2 (296 KB) zoneboot-trace-fleaD.tar.bz2 Jim Klimov, 2018-10-12 09:46 AM

Subtasks

Bug #9889: libbe reads defaults too oftenClosedAndy Fiddaman2018-10-12

Actions
Bug #9891: libbe slow building snapshot listClosedAndy Fiddaman2018-10-12

Actions
Feature #9892: Most consumers of be_list() do not need snapshotsClosedAndy Fiddaman2018-10-12

Actions
#1

Updated by Jim Klimov about 2 years ago

Attached a truss of the zoneadm startup, per

# truss -fl -rall -wall zoneadm -z selftest-giz-01 boot > /var/tmp/trace-zoneboot.txt 2>&1

A simple histogram shows the greatest offenders:

$ grep RDONLY trace-zoneboot.txt | sed 's,^.*open\(64\|\)("\([^\"]*\)".*$,\2,' | sort | uniq -c | sort -n
...
     20 /etc/mnttab
     41 /usr/share/lib/xml/dtd/zonecfg.dtd.1
     48 /etc/zones/index
    252 /usr/lib/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_OSCMD.mo
    280 /lib/libc.so.1
    280 /var/ld/ld.config
    285 /usr/lib/locale//en_US.UTF-8/LC_COLLATE/LCL_DATA
    285 /usr/lib/locale//en_US.UTF-8/LC_CTYPE/LCL_DATA
    285 /usr/lib/locale//en_US.UTF-8/LC_MESSAGES/LCL_DATA
    285 /usr/lib/locale//en_US.UTF-8/LC_MONETARY/LCL_DATA
    285 /usr/lib/locale//en_US.UTF-8/LC_NUMERIC/LCL_DATA
    285 /usr/lib/locale//en_US.UTF-8/LC_TIME/LCL_DATA
  20412 /etc/default/be

$ grep ZFS trace-zoneboot.txt | awk '{print $3}' | sort | uniq -c | sort -n
      4 ZFS_IOC_POOL_CONFIGS,
      6 ZFS_IOC_LOG_HISTORY,
     21 ZFS_IOC_POOL_GET_PROPS,
     29 ZFS_IOC_POOL_STATS,
     30 ZFS_IOC_OBJSET_STATS,
   2814 ZFS_IOC_DATASET_LIST_NEXT,
  20520 ZFS_IOC_SNAPSHOT_LIST_NEXT,

I am not sure where such numbers could come from... on one hand, the number is too big for the datasets under zoneroot; on another it is too small for overall snapshots count on the system:

$ zfs list -tall -r /zones/_zonemgr-selftest/GIZ/selftest-giz-01 | wc -l
1865

$ zfs list -t snap -r /zones/_zonemgr-selftest/GIZ/selftest-giz-01 | wc -l
1854

$ zfs list -t filesystem | wc -l
572

$ zfs list -t volume | wc -l
3

$ zfs list -t snapshot | wc -l
105535

$ zfs list -Honame -tall -r rpool | wc -l
98263
#2

Updated by Jim Klimov about 2 years ago

Looking at processes, this time it took about a minute and a half to begin starting the OS inside the zone:

$ ps -efZ | grep selft | sort -k 6,6
  global     root  8267  8247   0 14:07:58 pts/1       0:00 truss -fl -rall -wall zoneadm -z selftest-giz-01 boot
  global     root  8270  8267   0 14:07:58 pts/1       0:00 truss -fl -rall -wall zoneadm -z selftest-giz-01 boot
  global     root  8271     1   0 14:07:58 ?           0:00 zoneadmd -z selftest-giz-01
  global     root  8272  8270   0 14:07:58 pts/1       0:00 truss -fl -rall -wall zoneadm -z selftest-giz-01 boot
selftes*     root  8482     1   0 14:09:21 ?           0:00 zsched
selftes*     root  8734  8482   0 14:09:23 ?           0:00 /sbin/init
selftes*     root  8824     1   0 14:09:25 ?           0:04 /lib/svc/bin/svc.startd
...
#3

Updated by Jim Klimov about 2 years ago

Adding logs of another attempt with `truss -fleaD` arguments.
Timing was 1m27s from start of truss to start of zsched.

#4

Updated by Andy Fiddaman about 2 years ago

  • Category set to lib - userland libraries
  • Assignee set to Andy Fiddaman
  • Tags deleted (needs-triage)

There are various performance improvements in libbe needed to resolve this. I've created three new bugs/features to address this and linked them back to this bug as the parent.

#5

Updated by Andy Fiddaman about 2 years ago

  • Status changed from New to Closed

Closing, sub-tasks integrated.

Also available in: Atom PDF