add O_DIRECT support
We have had directio(3c) support for a very long time, but on Linux this s controlled via the O_DIRECT option flag to open(2). This difference causes porting overhead for applications developed on Linux. To eliminate this friction, we can add O_DIRECT support which is equivalent to directio(DIRECTIO_ON) as part of our open(2).
Updated by Jerry Jelinek 11 months ago
During code review the following issue was raised.
It appears other systems ignore the failure here (FreeBSD at least from a quick glance). I guess following the Linux error makes the most sense? Did you consider how others swallow the error at all?
In code review discussion, it became unclear how to resolve this. Joshua Clulow proposed answering the following questions, which I am documenting here along with the response.
Enumerate in the ticket what Linux and FreeBSD do here, where they differ and where they're the same
FreeBSD The system will attempt to avoid caching the data you read or write. If it cannot avoid caching the data, it will minimize the impact the data has on the cache. Linux Try to minimize cache effects of the I/O to and from this file. error return: EINVAL The filesystem does not support the O_DIRECT flag. The Linux man page also documents a lot more information about O_DIRECT caveats and possible behavior. I'm not including all of that here, but a few highlights: - File I/O is done directly to/from user-space buffers. - The O_DIRECT flag may impose alignment restrictions. - FreeBSD 4.x introduced a flag of the same name, but without alignment restrictions. - It is recommended that applications treat use of O_DIRECT as a performance option which is disabled by default. Summary The only important, application visible difference between FBSD and Linux is in the error handling when the underlying file system doesn't support O_DIRECT. Linux returns EINVAL, FBSD does not return an error
Probably make choices that bias towards the Linux behaviour, where their
choice is safe, because on balance that's the widest software target
The currently proposed solution is doing this.
When we're making arbitrary choices, bias towards ones we can tighten or
relax later; e.g., if Linux returns EINVAL in some cases that FreeBSD
swallows, we should probably start with EINVAL too -- if it's an issue we can
start swallowing the condition later, I imagine.
The currently proposed solution is doing this.
Not survey all of pkgsrc, but let's at least look at what, say, PostgreSQL
is going to do on illumos once we suddenly grow O_DIRECT. Is it going to
handle the error, or not, or whatever other behaviour. It would be a shame
if people only handle the error when #ifdef Linux specifically, rather than
just all the time when #ifdef O_DIRECT, say.
Postgres behavior summary #ifdef O_DIRECT #define PG_O_DIRECT O_DIRECT #else #define PG_O_DIRECT 0 #endif The code then uses PG_O_DIRECT in many places as part of the open(2) flags. Thus, once the illumos headers have O_DIRECT, PG will start using it. The PG code will fail out in most cases if open(2) returns an error of almost every kind (there are a few places they specifically handle ENOENT, but nothing else). On illumos, this behavior would be fine for a DB on: zfs ufs nfs (although probably not used with PG) On illumos, this behavior would fail for a DB on: pcfs tmpfs Both of these are unlikely file systems to host a PG DB.
Updated by Robert Mustacchi 10 months ago
Thanks for putting this together, Jerry. I think this causes the Linux behavior to make sense. As in erroring with EINVAL if it's not supported. I think the big question for me is what happens to folks from a performance perspective if O_DIRECT is now being used all of a sudden by applications like postgres where having data cached in the arc was load bearing. I don't know how much of a risk that really is or not. I assume that some of the changes being proposed around how O_DIRECT works in ZFS basically wouldn't really be at issue here and that the system wouldn't default to some of those proposals?
Updated by Jerry Jelinek 10 months ago
Literally nothing happens in that case since ZFS on both illumos and OpenZFS currently does absolutely for O_DIRECT. I put the explanation for this in the comment when I added support to ZFS for the _FIODIRECTIO ioctl. I'll include that here for reference
* ZFS inherently provides the basic semantics for directio. * This is the summary from the ZFS on Linux support for * O_DIRECT, which is the common form of directio, and required * no changes to ZFS. * * 1. Minimize cache effects of the I/O. * * By design the ARC is already scan-resistant, which helps * mitigate the need for special O_DIRECT handling. * * 2. O_DIRECT _MAY_ impose restrictions on IO alignment and * length. * * No additional alignment or length restrictions are * imposed by ZFS. * * 3. O_DIRECT _MAY_ perform unbuffered IO operations directly * between user memory and block device. * * No unbuffered IO operations are currently supported. In * order to support features such as compression, encryption, * and checksumming a copy must be made to transform the * data. * * 4. O_DIRECT _MAY_ imply O_DSYNC (XFS). * * O_DIRECT does not imply O_DSYNC for ZFS. * * 5. O_DIRECT _MAY_ disable file locking that serializes IO * operations. * * All I/O in ZFS is locked for correctness and this locking * is not disabled by O_DIRECT.
There is work in progress upstream to enhance OpenZFS for more comprehensive O_DIRECT support, but that is still under development.
Updated by Electric Monk 4 months ago
- Status changed from New to Closed
- % Done changed from 0 to 100
commit edd580643f2cf1434e252cd7779e83182ea84945 Author: Jerry Jelinek <firstname.lastname@example.org> Date: 2020-09-24T11:54:26.000Z 12363 add O_DIRECT support Reviewed by: Robert Mustacchi <email@example.com> Reviewed by: C Fraire <firstname.lastname@example.org> Approved by: Dan McDonald <email@example.com>