Project

General

Profile

Bug #12912

epoll mishandles excessive timeout negativity

Added by Patrick Mooney 3 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Category:
lib - userland libraries
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Upstreaming OS-5926 from SmartOS:

Reported by gmcmanus via illumos-joyent#136:

On Linux, if a negative timeout is given to epoll_wait, then it never times out [1]. On Illumos (and LX zones), only a timeout of -1 gives this behaviour. If timeout < -1, then it times out immediately [2].

The behaviour of timeouts less than -1 isn't documented in the Linux man pages. I noticed this because Python (unintentionally) depends on this behaviour [3].

I think it makes sense for epoll_wait (and epoll_pwait) in Illumos to behave like Linux in this case: never time out on a negative timeout.

For example, the below immediately prints "Num events: 0" on Illumos, but blocks on Linux. Changing the timeout to -1 causes it to block on both platforms.

    #include <stdio.h>
    #include <unistd.h>
    #include <sys/epoll.h>

    #define MAX_EVENTS 1

    int main(void) {
        int epfd = epoll_create(1);
        struct epoll_event events[MAX_EVENTS];
        int timeout = -2;
        int num_events = epoll_wait(epfd, events, MAX_EVENTS, timeout);
        printf("Num events: %d\n", num_events);
    }

[1] See comment and code for Linux's ep_poll

[2] See code for cv_timedwait_sig_hrtime which is called by the DP_POLL ioctl which is called by epoll_wait.

[3] In particular, in an LX zone running Debian 8.6, ipython3-5.1.0-3 (with python 3.5.1-4) uses 100% cpu while waiting for user input because it calls epoll in a loop with a timeout of -1000. It's the Python standard library which gives a timeout of -1000, so other Python programs could be affected too. This Python bug gives more details.

After investigating the related code, it appears both the libc and LX interfaces for epoll pass the timeout value directly through to devpoll. Inside devpoll, there's explicit handling for a value of -1, but other negative values will be treated like an already-expired timeout.

This could be fixed directly in devpoll, but there may be existing software which depends on the fail-fast behavior of negative timeout values (excluding -1).

History

#1

Updated by Electric Monk 3 months ago

  • Gerrit CR set to 765
#2

Updated by Patrick Mooney 3 months ago

Prior to integration of this change, a portion of the epoll test suite shows a problem relating to timeouts:

test_timeout    0       TPASS
test_timeout    1       TPASS
test_timeout    2       TPASS
test_timeout    3       TPASS
test_timeout    4       TPASS
test_timeout    5       TPASS
test_timeout    6       TFAIL: 0 != -1
test_timeout    7       TPASS
test_timeout    8       TFAIL: exceeded timeout interval
test_timeout    9       TPASS
test_timeout    10      TPASS
test_timeout    11      TPASS
test_timeout    12      TPASS

With the change applied, the test is happy, with the other parts of the suite unchanged:

test_timeout    0       TPASS
test_timeout    1       TPASS
test_timeout    2       TPASS
test_timeout    3       TPASS
test_timeout    4       TPASS
test_timeout    5       TPASS
test_timeout    6       TPASS
test_timeout    7       TPASS
test_timeout    8       TPASS
test_timeout    9       TPASS
test_timeout    10      TPASS
test_timeout    11      TPASS
test_timeout    12      TPASS

#3

Updated by Electric Monk 3 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit f4f9009fc79529ef8f45e7a31acd2ce4ca86a276

commit  f4f9009fc79529ef8f45e7a31acd2ce4ca86a276
Author: Patrick Mooney <pmooney@pfmooney.com>
Date:   2020-07-07T18:20:26.000Z

    12912 epoll mishandles excessive timeout negativity
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Ryan Zezeski <ryan.zeseski@joyent.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF