Project

General

Profile

Feature #9580

Add a hash-table on top of nvlist to speed-up operations

Added by Serapheim Dimitropoulos over 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Category:
-
Start date:
2018-06-05
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

While dealing with another performance issue (see https://github.com/openzfs/openzfs/commit/126118fbe4a957da76c8f44769ee63730080cafc)
we noticed that we spend a lot of time in various places in the kernel when constructing long nvlists. The problem is that when an nvlist is created
with the NV_UNIQUE_NAME set (which is the case most of the time), we do a linear search through the whole list to ensure uniqueness for every
entry we add.

An example of the above scenario can be seen in the following flamegraph, where more than have the time of the zfsdev_ioctl() is spent on
constructing nvlists. Flamegraph: https://sdimitro.github.io/img/flame/sdimitro_snap_unmount3.svg

Adding a table to speed up lookups will help situations where we just construct an nvlist (like the scenario above), in addition to regular
lookups and removals.


Related issues

Related to illumos gate - Bug #9914: NV_UNIQUE_NAME_TYPE broken after 9580Closed2018-10-22

Actions

History

#1

Updated by Serapheim Dimitropoulos over 1 year ago

A patch for this feature is available in https://github.com/openzfs/openzfs/pull/647 .
Copying the analysis here for completeness:

= Motivation

While dealing with another performance issue (see 126118f)
we noticed that we spend a lot of time in various places in the kernel when constructing
long nvlists. The problem is that when an nvlist is created with the NV_UNIQUE_NAME set
(which is the case most of the time), we do a linear search through the whole list to
ensure uniqueness for every entry we add.

An example of the above scenario can be seen in the following flamegraph, where more than
have the time of the zfsdev_ioctl() is spent on constructing nvlists.
Flamegraph: https://sdimitro.github.io/img/flame/sdimitro_snap_unmount3.svg

Adding a table to speed up lookups will help situations where we just construct an nvlist
(like the scenario above), in addition to regular lookups and removals.

= What this patch does

In this diff we've implemented a hash-table on top of the nvlist code that converts most
nvlist operations from O(# number of entries) to O(1)* (the start is for amortized time
as the hash-table grows and shrinks depending on the # of entries - plain lookup is
strictly O(1)).

= Performance Analysis

To analyze the performance improvement I just used the setup from the snapshot deletion
issue mentioned above in the Motivation section. Basically I created 10K filesystems with
one snapshot each and then I just used the API of libZFS_Core to pass down an nvlist of
all the snapshots to have them deleted. The reason I used my own driver program was to
have clean performance results of what actually happens in the kernel. The flamegraphs
and wall clock times mentioned below were gathered from the start to the end of the driver
program's run. Between trials the testpool used was completely destroyed, the system was
rebooted and the testpool was completely recreated. The reason for this dance was to get
consistent results.

Results (before patch):
= Sampling Flamegraphs
[Trial 1] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A.svg
[Trial 2] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A2.svg
[Trial 3] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A3.svg

=== Wall clock times (in seconds)

[Trial 4]
real 5.3
user 0.4
sys 2.3

[Trial 5]
real 8.2
user 0.4
sys 2.4

[Trial 6]
real 6.0
user 0.5
sys 2.3
Results (after patch):
= Sampling Flamegraphs
[Trial 1] https://sdimitro.github.io/img/flame/DLPX-53417/trial-Ae.svg
[Trial 2] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A2e.svg
[Trial 3] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A3e.svg

=== Wall clock times (in seconds)

[Trial 4]
real 4.9
user 0.0
sys 0.9

[Trial 5]
real 3.8
user 0.0
sys 0.9

[Trial 6]
real 3.6
user 0.0
sys 0.9 == Analysis
The results between the trials are consistent so in this sections I will only talk
about the flamegraph results from trial-1 and the wall-clock results from trial-4.

From trial-1 we can see that zfs_dev_ioctl() goes from 2,331 to 996 samples counts.
Specifically, the samples from fnvlist_add_nvlist() and spa_history_log_nvl() are
almost gone (~500 & ~800 to 5 & 5 samples), leaving zfs_ioc_destroy_snaps() to
dominate most samples from zfs_dev_ioctl().

From trial-4 we see that the user time dropped to 0 secods. I believe the consistent 0.4
seconds before my patch was applied was due to my driver program constructing the long
nvlist of snapshots so it can pass it to the kernel. As for the system time, the effect
there is more clear (2.3 down to 0.9 seconds).

#2

Updated by Electric Monk about 1 year ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 2ec7644aab2a726a64681fa66c6db8731b160de1

commit  2ec7644aab2a726a64681fa66c6db8731b160de1
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
Date:   2018-07-23T13:46:28.000Z

    9580 Add a hash-table on top of nvlist to speed-up operations
    Reviewed by: Matt Ahrens <matt@delphix.com>
    Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
    Approved by: Robert Mustacchi <rm@joyent.com>

#3

Updated by Andrew Stormont 11 months ago

  • Related to Bug #9914: NV_UNIQUE_NAME_TYPE broken after 9580 added

Also available in: Atom PDF