Project

General

Profile

Actions

Bug #15031

closed

event ports impacted by FDINFO accesses

Added by Patrick Mooney about 2 months ago. Updated about 2 months ago.

Status:
Closed
Priority:
Normal
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

This started with a report that tmux was acting strangely on machines running on recent OS bits. The behavior was as follows:

When attempting to start two tmux sessions (like one each from two separate SSH sessions), the second one will hang, rather than making enough progress to start a shell.
Further investigation showed that even a single tmux session (the first one started for that user on a machine) will start successfully, but the output from the contained shell will not be displayed on the screen until some key is hit.

Known good bits (which did not exhibit the issue) were reportedly from ee6ee36a8ff1701c4e61e6f118446b145220478c, but a merge of bits through 17e9e0ae7132604963fd5a20e8ece5f2630e7fdf on 28 Sept introduced the problematic behavior.

Since the range was small, and this seemed related to fd events, I started some targeted bisection of changes which were merged in that range.
The first suspect was #13700, but tmux continued to behave normally before/after it. The next issue which seemed suspicious was #14788. Bits built from the commit prior were just fine, while those built on top of #14788 immediately exhibited the problem.

With the commit identified, it was still not clear how the procfs change could be impacting tmux in this way. Some truss-ing of tmux on bits pre/post that change did not show any glaring difference. Some common behavior was of interest though:

100744: open("/proc/100744/fd/", O_RDONLY|O_NDELAY|O_LARGEFILE) = 6
100744: fcntl(6, F_SETFD, 0x00000001)           = 0
100744: fstat(6, 0xFFFFFC7FFFDF6710)            = 0
100744:     d=0x0000022A00000000 i=2968  m=0040500 l=2  u=0     g=0     sz=272
100744:     at = Sep 30 19:44:07 UTC 2022  [ 1664567047.357916941 ]
100744:     mt = Sep 30 19:44:07 UTC 2022  [ 1664567047.357916941 ]
100744:     ct = Sep 30 19:44:07 UTC 2022  [ 1664567047.357916941 ]
100744:     bsz=512   blks=1     fs=proc
100744: getdents(6, 0xFFFFFC7FEEF74000, 8192)       = 264
100744:     ino=2968  off=16   rlen=24  "." 
100744:     ino=2948  off=32   rlen=24  ".." 
100744:     ino=2969  off=48   rlen=24  "0" 
100744:     ino=2100121 off=64   rlen=24  "1" 
100744:     ino=4197273 off=80   rlen=24  "2" 
100744:     ino=6294425 off=96   rlen=24  "3" 
100744:     ino=8391577 off=112  rlen=24  "4" 
100744:     ino=10488729 off=128  rlen=24  "5" 
100744:     ino=12585881 off=144  rlen=24  "6" 
100744:     ino=14683033 off=160  rlen=24  "7" 
100744:     ino=16780185 off=176  rlen=24  "8" 
100744: lstat("/proc/100744/fd/0", 0xFFFFFC7FFFDF6B50)  = 0
100744:     d=0x0000022700000000 i=71827464 m=0020000 l=1  u=0     g=3     rdev=0x0000008900000002
100744:     at = Sep 30 17:53:29 UTC 2022  [ 1664560409.000000000 ]
100744:     mt = Sep 30 19:27:17 UTC 2022  [ 1664566037.000000000 ]
100744:     ct = Sep 30 17:53:29 UTC 2022  [ 1664560409.000000000 ]
100744:     bsz=8192  blks=0     fs=proc

For some reason, tmux was going fishing in /proc/<pid>/fd/*, which certainly explained how #14788 could be implicated. A look at the tmux sources gives us some background at why that is:

tmux 3.2 - compat/getdtablecount.c :

getdtablecount(void)
{
    char    path[PATH_MAX];
    glob_t    g;
    int    n = 0;

    if (snprintf(path, sizeof path, "/proc/%ld/fd/*", (long)getpid()) < 0)
        fatal("snprintf overflow");
    if (glob(path, 0, NULL, &g) == 0)
        n = g.gl_pathc;
    globfree(&g);
    return (n);
}

In order to find the number of open file descriptors in the process, tmux simply globs that directory.

OK, so we know tmux is rooting around in the fdinfo part of proc, but it's not yet clear why its behavior is different after that.
There's not immediately apparent difference in the event ports calls on working and broken systems.

One thing which stood out from #14788 was the change to closef(), instead of the old VN_RELE().
We can see that closef() unconditionally calls the VOP_CLOSE handler:

...
        count = fp->f_count--;
        flag = fp->f_flag;
        offset = fp->f_offset;

        vp = fp->f_vnode;

        error = VOP_CLOSE(vp, flag, count, offset, fp->f_cred, NULL);
...

Many vfs providers will largely ignore VOP_CLOSE operations when count is non-zero (since it does not represent the final and "real" close), but is event ports the same way?

No it is not!

static int
port_close(struct vnode *vp, int flag, int count, offset_t offset, cred_t *cr,
    caller_context_t *ct)
{
        port_t          *pp;
        port_queue_t    *portq;
        port_source_t   *ps;
        port_source_t   *ps_next;
        int             source;

        pp = VTOEP(vp);
        mutex_enter(&pp->port_mutex);
        if (pp->port_flags & PORT_CLOSED) {
                mutex_exit(&pp->port_mutex);
                return (0);
        }
        mutex_exit(&pp->port_mutex);

        portq = &pp->port_queue;
        if (count > 1) {
                /*
                 * It is not the last close.
                 * Remove/free all event resources owned by the current proc
                 * First notify all with the port associated sources about the
                 * close(2). The last argument of the close callback function
                 * advises the source about the type of of the close.
                 * If the port was set in alert mode by the curren process then
                 * remove the alert mode.
                 */

                /* check alert mode of the port */
                mutex_enter(&portq->portq_mutex);
                if ((portq->portq_flags & PORTQ_ALERT) &&
                    (portq->portq_alert.portal_pid == curproc->p_pid))
                        portq->portq_flags &= ~PORTQ_ALERT;
                mutex_exit(&portq->portq_mutex);

                /* notify all event sources about port_close() */
                mutex_enter(&portq->portq_source_mutex);
                for (source = 0; source < PORT_SCACHE_SIZE; source++) {
                        ps = portq->portq_scache[PORT_SHASH(source)];
                        for (; ps != NULL; ps = ps->portsrc_next) {
                                if (ps->portsrc_close != NULL)
                                        (*ps->portsrc_close)
                                            (ps->portsrc_closearg, pp->port_fd,
                                            curproc->p_pid, 0);
                        }
                }
                mutex_exit(&portq->portq_source_mutex);
                port_discard_events(&pp->port_queue);
                return (0);
        }
...

So it appears we're tearing down a bunch of portfs state on any event ports resource which is accessed via that arm of procfs.
Armed with that knowledge, a simpler reproducer was next:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <port.h>
#include <assert.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/param.h>
#include <sys/poll.h>
#include <stdbool.h>

static bool
has_event(int pfd)
{
        port_event_t ev = { 0 };
        timespec_t ts = { 0 };

        /* Because of illumos 14912, more care needs to be taken here */
        int res = port_get(pfd, &ev, &ts);
        if (res != 0 || ev.portev_source == 0) {
                return (false);
        } else {
                return (true);
        }
}

int
main(int argc, char *argv[])
{
        int res;
        int pipes[2];

        res = pipe2(pipes, 0);
        assert(res == 0);

        int pfd = port_create();
        assert(pfd >= 0);

        res = port_associate(pfd, PORT_SOURCE_FD, (uintptr_t)pipes[1], POLLIN, NULL);
        assert(res == 0);

        if (has_event(pfd)) {
                printf("FAIL - unexpected early event\n");
                return (EXIT_FAILURE);
        }

        char port_path[MAXPATHLEN];
        (void) sprintf(port_path, "/proc/%d/fd/%d", getpid(), pfd);

        /* incur the procfs FDINFO access */
        struct stat sbuf;
        res = lstat(port_path, &sbuf);
        assert(res == 0);

        /* write a byte to wake up the pipe */
        (void) write(pipes[0], port_path, 1);

        if (has_event(pfd)) {
                printf("PASS\n");
                return (EXIT_SUCCESS);
        } else {
                printf("FAIL - no event found\n");
                return (EXIT_FAILURE);
        }
}

It fails (as expected) on bits with #14788, and passes on bits prior to that.


Related issues

Related to illumos gate - Bug #14788: FDINFO misbehaves in multiple waysClosedDan McDonald

Actions
Related to illumos gate - Bug #15034: port_close acts strangely on count > 1New

Actions
Actions #1

Updated by Patrick Mooney about 2 months ago

  • Related to Bug #14788: FDINFO misbehaves in multiple ways added
Actions #2

Updated by Electric Monk about 2 months ago

  • Gerrit CR set to 2410
Actions #3

Updated by Patrick Mooney about 2 months ago

I ran the stress test detailed in #14788 for 30 minutes on a machine running the proposed fix (on non-DEBUG bits) and did not observe any issues.

The 14788-relevant stats:

              procfs`pr_getf+0xf3
              procfs`prgetattr+0xa90
              genunix`fop_getattr+0x91
              procfs`prgetfdinfo+0x87
              procfs`pr_read_fdinfo+0x113
              procfs`prread+0x57
              genunix`fop_read+0x60
              genunix`read+0x2b5
              unix`sys_syscall+0x17d
  mutex_tryenter fail                                             102

              procfs`prgetattr+0xa90
              genunix`fop_getattr+0x91
              procfs`prgetfdinfo+0x87
              procfs`pr_read_fdinfo+0x113
              procfs`prread+0x57
              genunix`fop_read+0x60
              genunix`read+0x2b5
              unix`sys_syscall+0x17d
  pr_getf fail                                                    297

              procfs`pr_getf+0xf3
              procfs`pr_read_fdinfo+0xa4
              procfs`prread+0x57
              genunix`fop_read+0x60
              genunix`read+0x2b5
              unix`sys_syscall+0x17d
  mutex_tryenter fail                                           38741

              procfs`pr_getf+0xf3
              procfs`prgetattr+0xa90
              genunix`fop_getattr+0x91
              genunix`cstat+0x3e
              genunix`fstat+0x36
              unix`sys_syscall+0x17d
  mutex_tryenter fail                                           53190

              procfs`prgetattr+0xa90
              genunix`fop_getattr+0x91
              procfs`prgetfdinfo+0x87
              procfs`pr_read_fdinfo+0x113
              procfs`prread+0x57
              genunix`fop_read+0x60
              genunix`read+0x2b5
              unix`sys_syscall+0x17d
  pr_getf success                                              111292

              procfs`pr_read_fdinfo+0xa4
              procfs`prread+0x57
              genunix`fop_read+0x60
              genunix`read+0x2b5
              unix`sys_syscall+0x17d
  pr_getf fail                                                 171390

              procfs`prgetattr+0xa90
              genunix`fop_getattr+0x91
              genunix`cstat+0x3e
              genunix`fstat+0x36
              unix`sys_syscall+0x17d
  pr_getf fail                                                 926138

              procfs`pr_read_fdinfo+0xa4
              procfs`prread+0x57
              genunix`fop_read+0x60
              genunix`read+0x2b5
              unix`sys_syscall+0x17d
  pr_getf success                                           274015002

              procfs`prgetattr+0xa90
              genunix`fop_getattr+0x91
              genunix`cstat+0x3e
              genunix`fstat+0x36
              unix`sys_syscall+0x17d
  pr_getf success                                           274192532

Actions #4

Updated by Patrick Mooney about 2 months ago

  • Description updated (diff)
Actions #5

Updated by Dan McDonald about 2 months ago

Further testing: 90mins of #14788's test regimen on each of non-DEBUG and DEBUG OmniOS kernels with a fixed-by-this-issue procfs replacement.

Final DTrace output from the DEBUG run:

  6  85536                        :tick-10s 
            success, tries ==  3, 336
            success, tries ==  2, 62278
            success, tries ==  1, 624643095
            Failure upon try # 3, 772
            Failure upon try # 2, 1108
            Failure upon try # 1, 63386
                       success 624705710
                       failure 449592
             empty after tries 0 448820

Actions #6

Updated by Patrick Mooney about 2 months ago

The newly added regression test passed on bits running with the fix:

Test: /opt/os-tests/tests/regression/illumos-15031 (run as root)  [00:00] [PASS]

When that test program is run on a platform lacking the fix, it fails as expected.

Actions #7

Updated by Patrick Mooney about 2 months ago

  • Related to Bug #15034: port_close acts strangely on count > 1 added
Actions #8

Updated by Electric Monk about 2 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 37e2cd25d56b334a2403f2540a0b0a1e6a40bcd1

commit  37e2cd25d56b334a2403f2540a0b0a1e6a40bcd1
Author: Patrick Mooney <pmooney@pfmooney.com>
Date:   2022-10-04T14:16:52.000Z

    15031 event ports impacted by FDINFO accesses
    Reviewed by: Ryan Goodfellow <ryan.goodfellow@oxide.computer>
    Reviewed by: Joshua M. Clulow <josh@sysmgr.org>
    Approved by: Dan McDonald <danmcd@mnx.io>

Actions

Also available in: Atom PDF