Bug #8604
closedAvoid unnecessary work search in VFS when unmounting snapshots
100%
Description
Every time we want to unmount a snapshot (happens during snapshot deletion or renaming) we unnecessarily iterate through all the mountpoints in the VFS layer (see zfs_get_vfs).
Ideally we would just put a hold on the snapshot and access its respective VFS resource directly.
Files
Updated by Serapheim Dimitropoulos about 6 years ago
Serapheim Dimitropoulos wrote:
Every time we want to unmount a snapshot (happens during snapshot deletion or renaming) we unnecessarily iterate through all the mountpoints in the VFS layer (see zfs_get_vfs).
Ideally we would just put a hold on the snapshot and access its respective VFS resource directly.
The above version is a TL;DR one. A more detailed description of the issue is the following:
During some workload where ~450 thousands datasets were being deleted (10 thousand at a time through direct calls to the respective LibZFS_Core API call) we experienced general slowness and discovered that a lot of time was spend in the kernel.
While root causing the issue we discovered that every snapshot in the nvlist passed to the kernel has to be unmounted first before being destroyed, and unmounting was taking a significant amount of time. It goes without saying that we must unmount a snapshots before deleting it, yet snapshots are rarely mounted (i.e. you have to specifically perform operations in the ~/.zfs/snapshot/ directory).
Looking more into the unmounting code we realized that in order to find the VFS resource (the actual mountoint) to unmount, we linearly search through each of the mountpoints in the VFS layer (see `zfs_get_vfs`). Since snapshots are rarely mounted and filesystems almost always are, the case is that for each snapshot to be deleted we do one whole iteration through all the mounted filesystems in the VFS layer. In terms of computational complexity that would be expressed as O(mn) where m is the number of mounted filessytems and n is the number of snapshots that we are trying to delete.
Dataset structures within ZFS have a reference to their VFS resources, so just by looking up the actual dataset structure by the name provided and using the VFS resource provided there, can make the complexity of it O(1).
The second bottleneck we found while investigating this was that in the nvlists passed around in those code paths we spend a lot of time ensuring that each entry that we add to them is unique (insert operation becomes O(n) - where n is the number of elements in the list). Turns out that we don't actually need to do this as the final nvlist, passed to the LUA script (which ultimately destroys the filesystems), is converted to a LUA table that ensures uniqueness for us. Thus a side fix to the above issue would be to remove the NV_UNIQUE_NAME flag when we create those intermediate nvlists.
For the whole analysis (including the flamegraphs used to understand the issue) please refer to https://sdimitro.github.io/post/snap-unmount-perf/
Updated by Serapheim Dimitropoulos about 6 years ago
- File gwilson_snap_unmount.svg gwilson_snap_unmount.svg added
Updated by Electric Monk about 6 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit ed992b0aac4e5b70dc1273b1d055c0d471fbb4b1
commit ed992b0aac4e5b70dc1273b1d055c0d471fbb4b1 Author: Serapheim Dimitropoulos <serapheim@delphix.com> Date: 2017-09-15T16:36:28.000Z 8604 Avoid unnecessary work search in VFS when unmounting snapshots Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Andy Stormont <astormont@racktopsystems.com> Approved by: Robert Mustacchi <rm@joyent.com>