Project

General

Profile

Actions

Bug #15036

closed

portfs wears inadequate pollcache disguise

Added by Patrick Mooney 2 months ago. Updated about 2 months ago.

Status:
Closed
Priority:
Normal
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

During the investigation of #15031, it was discovered that tmux has come to disable building with event-ports support as of 10 May 2021. The stated issue (tmux#2702) was that users would see occasional port_associate failures while attempting to load or store tmux copy buffers to/from files. A subsequent SmartOS ticket (#988) was filed as well.

I was able to reproduce the issue per the instructions, but only after repeated attempts:

509:    open("/root/new.txt", O_RDONLY|O_NONBLOCK)      = 7
509:    port_associate(3, 4, 0x00000007, 0x00000001, 0x00B8BC90) Err#1 EPERM

That was indeed an EPERM for a "normal" file to which the process/user otherwise had permissions to access.
In walking through the portfs code, it became apparent that the only possible source from this error was through the VOP_POLL call chain itself.

Since this file resides on a normal ZFS root, the generic fs_poll() would be used:

/*
 * Unlike poll(2), epoll should reject attempts to add normal files or
 * directories to a given handle.  Most non-pseudo filesystems rely on
 * fs_poll() as their implementation of polling behavior.  Exceptions to that
 * rule (ufs) can use fs_reject_epoll(), so they don't require access to the
 * inner details of poll.  Potential race conditions related to the poll module
 * being loaded are avoided by implementing the check here in genunix.
 */
boolean_t
fs_reject_epoll()
{
        /* Check if the currently-active pollcache is epoll-enabled. */
        return (curthread->t_pollcache != NULL &&
            (curthread->t_pollcache->pc_flag & PC_EPOLL) != 0);
}

/* ARGSUSED */
int
fs_poll(vnode_t *vp, short events, int anyyet, short *reventsp,
    struct pollhead **phpp, caller_context_t *ct)
{
        /*
         * Regular filesystems should reject epollers.  On the off chance that
         * a non-epoll consumer expresses the desire for edge-triggered
         * polling, we reject them too.  Yes, the expected error for this
         * really is EPERM.
         */
        if (fs_reject_epoll() || (events & POLLET) != 0) {
                return (EPERM);
        }

        *reventsp = 0;
        if (events & POLLIN)
                *reventsp |= POLLIN;
        if (events & POLLRDNORM)
                *reventsp |= POLLRDNORM;
        if (events & POLLRDBAND)
                *reventsp |= POLLRDBAND;
        if (events & POLLOUT)
                *reventsp |= POLLOUT;
        if (events & POLLWRBAND)
                *reventsp |= POLLWRBAND;

        return (0);
}

There is indeed a chance for EPERM: epollers attempting to poll on regular files expect EPERM (as of #12908). It makes no sense why that would trigger for a portfs pollcache though...

Let's look at how that's being set:

int
port_associate_fd(port_t *pp, int source, uintptr_t object, int events,
    void *user)
{
        port_fdcache_t  *pcp;
        int             fd;
        struct pollhead *php = NULL;
        portfd_t        *pfd;
        polldat_t       *pdp;
        file_t          *fp;
        port_kevent_t   *pkevp;
        short           revents;
        int             error = 0;
        int             active;

        pcp = pp->port_queue.portq_pcp;
...
        /*
         * do VOP_POLL and cache this poll fd.
         *
         * XXX - pollrelock() logic needs to know
         * which pollcache lock to grab. It'd be a
         * cleaner solution if we could pass pcp as
         * an arguement in VOP_POLL interface instead
         * of implicitly passing it using thread_t
         * struct. On the other hand, changing VOP_POLL
         * interface will require all driver/file system
         * poll routine to change.
         */
        curthread->t_pollcache = (pollcache_t *)pcp;
        error = VOP_POLL(fp->f_vnode, events, 0, &revents, &php, NULL);
        curthread->t_pollcache = NULL;
...

That looks mostly normal, but why the explicit cast for the pollcache_t *? Surely portq_pcp is a pollcache_t....

typedef struct port_queue {
   ...
        struct port_fdcache *portq_pcp; /* fd cache */
...
} port_queue_t;

Oh my...

The definition for port_fdcache has more color on the situation:

/*
 * PORT_SOURCE_FD cache per port.
 * One cache for each port that uses PORT_SOURCE_FD.
 * pc_lock must be the first element of port_fdcache_t to keep it
 * synchronized with the offset of pc_lock in pollcache_t (see pollrelock()).
 */
typedef struct port_fdcache {
        kmutex_t        pc_lock;        /* lock to protect portcache */
        kcondvar_t      pc_lclosecv;
        struct portfd   **pc_hash;      /* points to a hash table of ptrs */
        int             pc_hashsize;    /* the size of current hash table */
        int             pc_fdcount;     /* track how many fd's are hashed */
} port_fdcache_t;

So when fs_reject_epoll goes to check a portfs pollcache, it's really fishing in whatever memory resides after the port_fdcache_t, since it is shorter than where pc_flag resides in pollcache_t.

It's not immediately clear why poll_fdcache_t was used, rather than pollcache_t. It will likely be most expedient to pad port_fdcache_t out far enough to have an empty pc_flag, and establish more robust warnings (including some CTASSERTs) as guard rails to deter this sort of thing from happening again soon.


Related issues

Related to illumos gate - Bug #12908: epoll should exclude normal files/directoriesClosedPatrick Mooney

Actions
Related to illumos gate - Bug #15104: want shared type between pollcache_t and port_fdcache_tNew

Actions
Actions #1

Updated by Patrick Mooney 2 months ago

  • Description updated (diff)
Actions #2

Updated by Electric Monk about 2 months ago

  • Gerrit CR set to 2440
Actions #3

Updated by Patrick Mooney about 2 months ago

  • Related to Bug #12908: epoll should exclude normal files/directories added
Actions #4

Updated by Patrick Mooney about 2 months ago

This would have been possible to hit since the integration of #12908.

Attempts to reproduce it with a unit test proved very difficult on DEBUG bits, but possible (although inconsistent) on non-DEBUG.

Actions #5

Updated by Patrick Mooney about 2 months ago

I added a regression test which attempts, via brute force, to induce the conditions which would cause this issue to appear. It does not trigger 100% reliably on a "vulnerable" system, particularly when running DEBUG bits. It does, however, run cleanly when the proposed fix is in place.

I confirmed that the normal-file and normal-dir tests from the epoll test suite are still passing (considering that behavior from #12908 was involved here):

test_dir        0       TPASS
test_dir        1       TPASS
test_file       0       TPASS
test_file       1       TPASS

The rest of the suite ran fine as well (accounting for portions which are known to fail on illumos)

Actions #6

Updated by Patrick Mooney about 2 months ago

The os-tests suite was run, and modulo some not-uncommon failures from tests which seem to be intermittent, the results looked fine.

Test: /opt/os-tests/tests/clock_gettime.32 (run as root)          [00:00] [PASS]
Test: /opt/os-tests/tests/clock_gettime.64 (run as root)          [00:00] [PASS]
Test: /opt/os-tests/tests/definit/definit (run as root)           [00:00] [PASS]
Test: /opt/os-tests/tests/eventfd.32 (run as root)                [00:00] [PASS]
Test: /opt/os-tests/tests/eventfd.64 (run as root)                [00:00] [PASS]
Test: /opt/os-tests/tests/imc_test (run as root)                  [00:00] [PASS]
Test: /opt/os-tests/tests/odirectory.32 (run as root)             [00:00] [PASS]
Test: /opt/os-tests/tests/odirectory.64 (run as root)             [00:00] [PASS]
Test: /opt/os-tests/tests/writev.32 (run as root)                 [00:00] [PASS]
Test: /opt/os-tests/tests/writev.64 (run as root)                 [00:00] [PASS]
Test: /opt/os-tests/tests/zen_umc_test (run as root)              [00:00] [PASS]
Test: /opt/os-tests/tests/cores/core_prereqs (run as root)        [00:00] [PASS]
Test: /opt/os-tests/tests/cores/coretests (run as root)           [00:03] [FAIL]
Test: /opt/os-tests/tests/ddi_ufm/ufm-test-setup (run as root)    [00:00] [PASS]
Test: /opt/os-tests/tests/ddi_ufm/ufm-test (run as root)          [00:00] [PASS]
Test: /opt/os-tests/tests/ddi_ufm/ufm-test-cleanup (run as root)  [00:00] [PASS]
Test: /opt/os-tests/tests/file-locking/runtests.32 (run as root)  [00:21] [PASS]
Test: /opt/os-tests/tests/file-locking/runtests.64 (run as root)  [00:21] [PASS]
Test: /opt/os-tests/tests/i386/ldt (run as root)                  [00:06] [PASS]
Test: /opt/os-tests/tests/i386/badseg (run as root)               [00:05] [PASS]
Test: /opt/os-tests/tests/ksensor/ksensor_init (run as root)      [00:00] [PASS]
Test: /opt/os-tests/tests/ksensor/ksensor_basic.32 (run as root)  [00:00] [PASS]
Test: /opt/os-tests/tests/ksensor/ksensor_basic.64 (run as root)  [00:00] [PASS]
Test: /opt/os-tests/tests/ksensor/ksensor_err.32 (run as root)    [00:00] [PASS]
Test: /opt/os-tests/tests/ksensor/ksensor_err.64 (run as root)    [00:00] [PASS]
Test: /opt/os-tests/tests/ksensor/ksensor_fini (run as root)      [00:02] [PASS]
Test: /opt/os-tests/tests/libtopo/digraph-test (run as root)      [00:00] [PASS]
Test: /opt/os-tests/tests/pf_key/acquire-compare (run as root)    [00:43] [PASS]
Test: /opt/os-tests/tests/pf_key/kmc-update (run as root)         [00:00] [PASS]
Test: /opt/os-tests/tests/portfs/file_assoc.32 (run as root)      [00:00] [PASS]
Test: /opt/os-tests/tests/portfs/file_assoc.64 (run as root)      [00:00] [PASS]
Test: /opt/os-tests/tests/regression/illumos-15031 (run as root)  [00:00] [PASS]
Test: /opt/os-tests/tests/regression/illumos-15036 (run as root)  [00:00] [PASS]
Test: /opt/os-tests/tests/sdevfs/sdevfs_eisdir (run as root)      [00:00] [PASS]
Test: /opt/os-tests/tests/secflags/secflags_aslr (run as root)    [00:01] [PASS]
Test: /opt/os-tests/tests/secflags/secflags_core (run as root)    [00:01] [FAIL]
Test: /opt/os-tests/tests/secflags/secflags_dts (run as root)     [00:11] [PASS]
Test: /opt/os-tests/tests/secflags/secflags_elfdump (run as root) [00:01] [FAIL]
Test: /opt/os-tests/tests/secflags/secflags_forbidnullmap (run as root) [00:00] [PASS]
Test: /opt/os-tests/tests/secflags/secflags_limits (run as root)  [00:00] [PASS]
Test: /opt/os-tests/tests/secflags/secflags_noexecstack (run as root) [00:00] [PASS]
Test: /opt/os-tests/tests/secflags/secflags_proc (run as root)    [00:01] [PASS]
Test: /opt/os-tests/tests/secflags/secflags_psecflags (run as root) [00:11] [PASS]
Test: /opt/os-tests/tests/secflags/secflags_syscall (run as root) [00:00] [PASS]
Test: /opt/os-tests/tests/secflags/secflags_truss (run as root)   [00:00] [PASS]
Test: /opt/os-tests/tests/secflags/secflags_zonecfg (run as root) [00:00] [PASS]
Test: /opt/os-tests/tests/sigqueue/sigqueue_queue_size (run as root) [00:00] [PASS]
Test: /opt/os-tests/tests/sockfs/conn (run as root)               [00:05] [PASS]
Test: /opt/os-tests/tests/sockfs/dgram (run as root)              [00:05] [PASS]
Test: /opt/os-tests/tests/sockfs/drop_priv (run as root)          [00:07] [PASS]
Test: /opt/os-tests/tests/sockfs/nosignal (run as root)           [00:02] [PASS]
Test: /opt/os-tests/tests/sockfs/rights.32 (run as root)          [00:00] [PASS]
Test: /opt/os-tests/tests/sockfs/rights.64 (run as root)          [00:00] [PASS]
Test: /opt/os-tests/tests/sockfs/sockpair (run as root)           [00:10] [PASS]
Test: /opt/os-tests/tests/sockfs/recvmsg.32 (run as root)         [00:00] [PASS]
Test: /opt/os-tests/tests/sockfs/recvmsg.64 (run as root)         [00:00] [PASS]
Test: /opt/os-tests/tests/stackalign/stackalign.32 (run as root)  [00:00] [PASS]
Test: /opt/os-tests/tests/stackalign/stackalign.64 (run as root)  [00:00] [PASS]
Test: /opt/os-tests/tests/stress/dladm-kstat (run as root)        [00:20] [PASS]
Test: /opt/os-tests/tests/syscall/fchmodat.32 (run as root)       [00:00] [PASS]
Test: /opt/os-tests/tests/syscall/fchmodat.64 (run as root)       [00:00] [PASS]
Test: /opt/os-tests/tests/syscall/open.32 (run as root)           [00:00] [PASS]
Test: /opt/os-tests/tests/syscall/open.64 (run as root)           [00:00] [PASS]
Test: /opt/os-tests/tests/timer/timer_limit (run as root)         [00:00] [PASS]
Test: /opt/os-tests/tests/uccid/atrparse (run as root)            [00:00] [PASS]

Results Summary
PASS      62
FAIL       3

Running Time:   00:03:02
Percent passed: 95.4%
Log directory:  /var/tmp/test_results/20221019T011354
root@carbide:~# /opt/os-tests/tests/secflags/secflags_core
No differences encountered
No differences encountered
# echo $?
0
# /opt/os-tests/tests/secflags/secflags_elfdump
No differences encountered
No differences encountered
# echo $?
0
# /opt/os-tests/tests/cores/coretests
TEST PASSED: kernel dumper.32 none
TEST PASSED: gcore dumper.32 none
TEST PASSED: kernel dumper.64 none
TEST PASSED: gcore dumper.64 none
TEST PASSED: kernel dumper.32 ctf
TEST PASSED: gcore dumper.32 ctf
TEST PASSED: kernel dumper.64 ctf
TEST PASSED: gcore dumper.64 ctf
TEST PASSED: kernel dumper.32 debug
TEST PASSED: gcore dumper.32 debug
TEST PASSED: kernel dumper.64 debug
TEST PASSED: gcore dumper.64 debug
TEST PASSED: kernel dumper.32 symtab
TEST PASSED: gcore dumper.32 symtab
TEST PASSED: kernel dumper.64 symtab
TEST PASSED: gcore dumper.64 symtab
TEST PASSED: kernel dumper.32 ctf+debug+symtab
TEST PASSED: gcore dumper.32 ctf+debug+symtab
TEST PASSED: kernel dumper.64 ctf+debug+symtab
TEST PASSED: gcore dumper.64 ctf+debug+symtab
TEST PASSED: kernel dumper.32 anon+data+ctf+debug+symtab
TEST PASSED: gcore dumper.32 anon+data+ctf+debug+symtab
TEST PASSED: kernel dumper.64 anon+data+ctf+debug+symtab
TEST PASSED: gcore dumper.64 anon+data+ctf+debug+symtab
TEST PASSED: kernel dumper.32 default
TEST PASSED: gcore dumper.32 default
TEST PASSED: kernel dumper.64 default
TEST PASSED: gcore dumper.64 default
TEST PASSED: kernel dumper.32 default-ctf-debug-symtab
TEST PASSED: gcore dumper.32 default-ctf-debug-symtab
TEST PASSED: kernel dumper.64 default-ctf-debug-symtab
TEST PASSED: gcore dumper.64 default-ctf-debug-symtab
TEST PASSED: kernel dumper.32 default+debug
TEST PASSED: gcore dumper.32 default+debug
TEST PASSED: kernel dumper.64 default+debug
TEST PASSED: gcore dumper.64 default+debug
TEST PASSED: kernel dumper.32 default-symtab
TEST PASSED: gcore dumper.32 default-symtab
TEST PASSED: kernel dumper.64 default-symtab
TEST PASSED: gcore dumper.64 default-symtab
All tests passed successfully
# echo $?
0
Actions #7

Updated by Patrick Mooney about 2 months ago

  • Related to Bug #15104: want shared type between pollcache_t and port_fdcache_t added
Actions #8

Updated by Patrick Mooney about 2 months ago

I've written up #15104 detailing the propose future work which should make this sort of problem less likely in the future.

Actions #9

Updated by Electric Monk about 2 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit f23ed011dd1990f5b6b2d755feeaa7baf5a22caa

commit  f23ed011dd1990f5b6b2d755feeaa7baf5a22caa
Author: Patrick Mooney <pmooney@pfmooney.com>
Date:   2022-10-19T15:31:00.000Z

    15036 portfs wears inadequate pollcache disguise
    Reviewed by: Andy Fiddaman <illumos@fiddaman.net>
    Reviewed by: Robert Mustacchi <rm@fingolfin.org>
    Approved by: Dan McDonald <danmcd@mnx.io>

Actions

Also available in: Atom PDF