Project

General

Profile

Bug #8115

parallel zfs mount

Added by Prakash Surya over 2 years ago. Updated 11 months ago.

Status:
Closed
Priority:
High
Assignee:
Category:
zfs - Zettabyte File System
Start date:
2018-10-10
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Difficulty:
Medium
Tags:
needs-triage

Description

In analyzing the time it takes for a Delphix Engine to come up following
a planned or unplanned reboot, we've determined that the SMF service
(filesystem/local) that's responsible for mounting all local filesystems
(except for /) is responsible for a significant percentage of the boot
time. The longer it takes for the Delphix Engine to come up, the longer
the Delphix Engine is unavailable during these outages. For example, on
a Delphix Engine with roughly 3000 filesystems, we have the following
breakdown of "filesystem/local" start time for a sample of 74 reboots:

  1. NumSamples = 74; Min = 0.00; Max = 782.00 # Mean = 186.972973; Variance = 17853.891161; SD = 133.618454; Median 156.000000 # each * represents a count of 1
    0.0000 - 78.2000 [ 10]: ***
    78.2000 - 156.4000 [ 27]: *
    *******************
    156.4000 - 234.6000 [ 17]: *********
    234.6000 - 312.8000 [ 8]: **
    312.8000 - 391.0000 [ 8]: **
    391.0000 - 469.2000 [ 1]: *
    469.2000 - 547.4000 [ 1]: *
    547.4000 - 625.6000 [ 1]: *
    625.6000 - 703.8000 [ 0]:
    703.8000 - 782.0000 [ 1]: *

On average, it takes over 3 minutes to mount local filesystems on that
system. A sampling of 56 reboots on another system which has 9000+
filesystems is below:

  1. NumSamples = 56; Min = 0.00; Max = 1377.00 # Mean = 175.250000; Variance = 54092.223214; SD = 232.577349; Median 118.000000 # each * represents a count of 1
    0.0000 - 137.7000 [ 37]: *****************************
    137.7000 - 275.4000 [ 11]: ****
    275.4000 - 413.1000 [ 4]: *
    **
    413.1000 - 550.8000 [ 1]: *
    550.8000 - 688.5000 [ 1]: *
    688.5000 - 826.2000 [ 0]:
    826.2000 - 963.9000 [ 0]:
    963.9000 - 1101.6000 [ 1]: *
    1101.6000 - 1239.3000 [ 0]:
    1239.3000 - 1377.0000 [ 1]: *

Mounting of filesystems in "filesystem/local" is done using `zfs mount -a`,
which mounts each filesystems serially. The bottleneck for each mount is
the I/O done to load metadata for each filesystem. As such, mounting
filesystems using a parallel algorithm should be a big win, and bring down
the runtime of "filesystem/local"'s start method.


Subtasks

Bug #9880: Race in ZFS parallel mountClosedAndy Fiddaman

Actions

History

#1

Updated by Electric Monk over 1 year ago

  • % Done changed from 0 to 100
  • Status changed from New to Closed

git commit 591e0e133f9980083db5d64ac33a30bcc3382ff7

commit  591e0e133f9980083db5d64ac33a30bcc3382ff7
Author: Sebastien Roy <seb@delphix.com>
Date:   2018-05-15T22:59:28.000Z

    8115 parallel zfs mount
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
    Reviewed by: Brad Lewis <brad.lewis@delphix.com>
    Reviewed by: George Wilson <george.wilson@delphix.com>
    Reviewed by: Paul Dagnelie <pcd@delphix.com>
    Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
    Approved by: Matt Ahrens <mahrens@delphix.com>

Also available in: Atom PDF