Project

General

Profile

Bug #12363

add O_DIRECT support

Added by Jerry Jelinek 8 months ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

We have had directio(3c) support for a very long time, but on Linux this s controlled via the O_DIRECT option flag to open(2). This difference causes porting overhead for applications developed on Linux. To eliminate this friction, we can add O_DIRECT support which is equivalent to directio(DIRECTIO_ON) as part of our open(2).

History

#1

Updated by Jerry Jelinek 8 months ago

During code review the following issue was raised.

It appears other systems ignore the failure here (FreeBSD 
at least from a quick glance). I guess following the Linux
error makes the most sense? Did you consider how others
swallow the error at all?

In code review discussion, it became unclear how to resolve this. Joshua Clulow proposed answering the following questions, which I am documenting here along with the response.

Enumerate in the ticket what Linux and FreeBSD do here, where they differ and where they're the same

FreeBSD
        The system will attempt to avoid caching the data you
        read or write. If it cannot avoid caching the data, it will
        minimize the impact the data has on the cache.

Linux
        Try to minimize cache effects of the I/O to and from this file.
        error return:
        EINVAL The filesystem does not support the O_DIRECT flag.

        The Linux man page also documents a lot more information
        about O_DIRECT caveats and possible behavior. I'm not
        including all of that here, but a few highlights:
        - File I/O is done directly to/from user-space buffers.
        - The O_DIRECT flag may impose alignment restrictions.
        - FreeBSD 4.x introduced a flag of the same name, but 
          without alignment restrictions.
        - It is recommended that applications treat use of O_DIRECT as
          a performance option which is disabled by default.

Summary        
        The only important, application visible difference between
        FBSD and Linux is in the error handling when the underlying
        file system doesn't support O_DIRECT. 
        Linux returns EINVAL, FBSD does not return an error

Probably make choices that bias towards the Linux behaviour, where their
choice is safe, because on balance that's the widest software target
        The currently proposed solution is doing this.

When we're making arbitrary choices, bias towards ones we can tighten or
relax later; e.g., if Linux returns EINVAL in some cases that FreeBSD
swallows, we should probably start with EINVAL too -- if it's an issue we can
start swallowing the condition later, I imagine.
        The currently proposed solution is doing this.

Not survey all of pkgsrc, but let's at least look at what, say, PostgreSQL
is going to do on illumos once we suddenly grow O_DIRECT. Is it going to
handle the error, or not, or whatever other behaviour. It would be a shame
if people only handle the error when #ifdef Linux specifically, rather than
just all the time when #ifdef O_DIRECT, say.
Postgres behavior summary

#ifdef O_DIRECT
#define PG_O_DIRECT                             O_DIRECT
#else
#define PG_O_DIRECT                             0
#endif

The code then uses PG_O_DIRECT in many places as part of
the open(2) flags.

Thus, once the illumos headers have O_DIRECT, PG will start using it.

The PG code will fail out in most cases if open(2) returns an
error of almost every kind (there are a few places they specifically
handle ENOENT, but nothing else).

On illumos, this behavior would be fine for a DB on:
    zfs 
    ufs 
    nfs (although probably not used with PG)

On illumos, this behavior would fail for a DB on:
    pcfs
    tmpfs
Both of these are unlikely file systems to host a PG DB.

#2

Updated by Robert Mustacchi 8 months ago

Thanks for putting this together, Jerry. I think this causes the Linux behavior to make sense. As in erroring with EINVAL if it's not supported. I think the big question for me is what happens to folks from a performance perspective if O_DIRECT is now being used all of a sudden by applications like postgres where having data cached in the arc was load bearing. I don't know how much of a risk that really is or not. I assume that some of the changes being proposed around how O_DIRECT works in ZFS basically wouldn't really be at issue here and that the system wouldn't default to some of those proposals?

#3

Updated by Jerry Jelinek 8 months ago

Literally nothing happens in that case since ZFS on both illumos and OpenZFS currently does absolutely for O_DIRECT. I put the explanation for this in the comment when I added support to ZFS for the _FIODIRECTIO ioctl. I'll include that here for reference

                 * ZFS inherently provides the basic semantics for directio.
                 * This is the summary from the ZFS on Linux support for
                 * O_DIRECT, which is the common form of directio, and required
                 * no changes to ZFS.
                 *
                 * 1. Minimize cache effects of the I/O.
                 *
                 *    By design the ARC is already scan-resistant, which helps
                 *    mitigate the need for special O_DIRECT handling.
                 *
                 * 2. O_DIRECT _MAY_ impose restrictions on IO alignment and
                 *    length.
                 *
                 *    No additional alignment or length restrictions are
                 *    imposed by ZFS.
                 *
                 * 3. O_DIRECT _MAY_ perform unbuffered IO operations directly
                 *    between user memory and block device.
                 *
                 *    No unbuffered IO operations are currently supported. In
                 *    order to support features such as compression, encryption,
                 *    and checksumming a copy must be made to transform the
                 *    data.
                 *
                 * 4. O_DIRECT _MAY_ imply O_DSYNC (XFS).
                 *
                 *    O_DIRECT does not imply O_DSYNC for ZFS.
                 *
                 * 5. O_DIRECT _MAY_ disable file locking that serializes IO
                 *    operations.
                 *
                 *    All I/O in ZFS is locked for correctness and this locking
                 *    is not disabled by O_DIRECT.

There is work in progress upstream to enhance OpenZFS for more comprehensive O_DIRECT support, but that is still under development.

#4

Updated by Robert Mustacchi 7 months ago

OK, given all that and the fact that things on ZFS will probably always default to the current behavior, then it seems like this is pretty safe and there won't be too much in terms of adverse reactions. Thanks for taking a look.

#5

Updated by Jerry Jelinek about 1 month ago

For testing, I ran the 2 new test cases that have been added to the OS test suite for this change. I also manually verified that the man page changes look ok.

#6

Updated by Electric Monk about 1 month ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit edd580643f2cf1434e252cd7779e83182ea84945

commit  edd580643f2cf1434e252cd7779e83182ea84945
Author: Jerry Jelinek <jerry.jelinek@joyent.com>
Date:   2020-09-24T11:54:26.000Z

    12363 add O_DIRECT support
    Reviewed by: Robert Mustacchi <rm@fingolfin.org>
    Reviewed by: C Fraire <cfraire@me.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF