Bug #9074

domount() interprets ZFS filesystem names as relative paths

Added by Prakash Surya 8 months ago. Updated 8 months ago.

Status:ClosedStart date:2018-02-07
Priority:NormalDue date:
Assignee:Prakash Surya% Done:

100%

Category:filesystems (not ZFS)
Target version:-
Difficulty:Medium Tags:needs-triage

Description

The domount() function that is called by the mount() system call erroneously
calls lookupname() (a VFS filename lookup function) on ZFS filesystem names
when mounting ZFS filesystems. This causes spurious short-lived vnode holds to
be placed on vnode_t's unrelated to the filesystem being mounted, as described
below.

domount() maintains a VFS "mount-in-progress table" which contains a mapping
of dev_t's being mounted to filesystems (see vfs_addmip()). The idea is that
filesystems like UFS can, when processing a mount request, look at the table
to determine if a device is already in the process of being mounted (see
vfs_devmounting()). In order to add a device to the "mount-in-progress table"
domount() calls lookupname() on the mount "spec" (e.g. for a UFS filesystem,
that would be /dev/dsk/blabla) to obtain the vnode associated with that
pathname, and adds the vnode's v_rdev to the table (under the assumption that
it's a device...).

The lookupname() function does its work by starting at the root of the path,
and calling VOP_LOOKUP() on the "next" component of the path. Once it has held
the vnode associated with the top-most directory, it iteratively calls
VOP_LOOKUP() on the next component, and so on, until it has resolved the
entire path, and has a vnode hold on the leaf filesystem entry (it also
follows symbolic links, etc.). The guts of this algorithm is implemented in
lookuppnvp(). The entire call chain for this lookupname() call from domount()
looks like:

domount()
lookupname()
lookupnameatcred()
lookuppnatcred()
lookuppnvp()
(loops over components of the path doing VOP_LOOKUP() calls on each one)

This all seems reasonable so far, but as you might have guessed from the
summary for this bug, none of this makes a lick of sense when mounting a ZFS
filesystem. The mount spec that is passed into the mount system call for ZFS
isn't a path at all, but a ZFS filesystem name, which doesn't exist in the VFS
namespace.

What ends up happening in that case is that domount() passes a filesystem name
into lookupname(). That filesystem name kind of looks like a path, and
specifically, it looks like a relative path name (a path without a leading
'/'). As it turns out, the lookuppnatcred() call in the call chain above
handles this by pre-pending the current working directory to the path in this
case:

if (pnp->pn_path[0] == '/') {
vp = rootvp;
} else {
vp = (startvp == NULL) ? PTOU(p)->u_cdir : startvp;
}

This code results in a fully-qualified path getting ultimately passed in to
lookuppnvp(), and is part of the generic lookup code. The end result is that
if I mount a ZFS filesystem named "mypool/seb", and my PWD is "/home/seb",
lookuppnvp() will attempt to get a hold on the path named
"/home/seb/mypool/seb", which is non-sensical, and fails. This failure is
ultimately gracefully handled by domount(), which simply silently doesn't
enter a device into the "mount-in-progress table" in this case.

So you might then think, "if domount() silently handles this gracefully, then
this could just be left alone, as it's harmless." Unfortunately, it's not
harmless, as the lookuppnvp() function iteratively places vnode holds on each
component of the path that it's looking up. Consider the following scenario
where we have the following filesystems and associated mountpoints:

NAME       MOUNTPOINT
seb /seb
seb/data /data

These two mountpoints are unrelated to one another. I should be able to
simultaneously mount these two filesystems. Let's say I do that, and my CWD is
"/". Unfortunately, due to the lookupname() call described above in domount(),
the mounting of the "seb" filesystem might fail with EBUSY if the mount for
"seb/data" happens to be in the middle of attempting to resolve the bogus path
"/seb/data", a very short-lived vnode hold on "/seb" will cause the mount for
the "seb" filesystem to fail with "EBUSY" when it attempts to check if its
mountpoint is "busy".

As such, the lookupname() code in domount() should only be called if the mount
spec passed in is a path, and not an opaque filesystem-dependent string as it
is for ZFS.

Also note that this problem exists not only for ZFS, but for any filesystem
whose mount spec isn't a device path (e.g. NFS, swapfs, tmpfs, ctfs, autofs,
procfs, etc.).

History

#1 Updated by Electric Monk 8 months ago

  • % Done changed from 0 to 100
  • Status changed from New to Closed

git commit d9a54dd1ef75248420c035ec1d240674f8d1f4fb

commit  d9a54dd1ef75248420c035ec1d240674f8d1f4fb
Author: Sebastien Roy <seb@delphix.com>
Date:   2018-02-13T16:28:04.000Z

    9074 domount() interprets ZFS filesystem names as relative paths
    Reviewed by: Paul Dagnelie <pcd@delphix.com>
    Reviewed by: George Wilson <george.wilson@delphix.com>
    Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>

Also available in: Atom