Project

General

Profile

Actions

Bug #14892

open

pollhead lifetime too short in signalfd

Added by Patrick Mooney about 2 months ago.

Status:
In Progress
Priority:
Normal
Category:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

While doing further testing of #13700, andyf was able to reproduce the original issue (using postgres) on bits built with the 13700 change: It turns out it fixed a problem (or rather a few), but not the problem. Let's take a look at the dump he was able to collect:

BAD TRAP: type=d (#gp General protection) rp=fffffe00412f5980 addr=fffffe2e2f89b9e8

postgres:
#gp General protection
addr=0xfffffe2e2f89b9e8
pid=113569, pc=0xfffffffffbdc2ce8, sp=0xfffffe00412f5a70, eflags=0x10283
cr0: 80050033<pg,wp,ne,et,mp,pe>  cr4: 1406f8<smep,osxsav,xmme,fxsr,pge,mce,pae,pse,de>
cr2: fffffc7fdcb29000
cr3: 6b552a000
cr8: 0

        rdi: fffffffffbfb23f0 rsi:                0 rdx: fffffe2d6febf3e0
        rcx:                2  r8:       7d49a90d10  r9:                1
        rax:                0 rbx: deadbeefdeadbeef rbp: fffffe00412f5aa0
        r10: fffffffffb88524c r11: fffffe2d6febf3e0 r12: fffffe2e2f89b9e8
        r13: fffffe2d9916a128 r14: fffffffffbfb23f0 r15: fffffe2e2f89b9e8
        fsb: fffffc7fef022a40 gsb: fffffffffbc40000  ds:               4b
         es:               4b  fs:                0  gs:                0
        trp:                d err:                0 rip: fffffffffbdc2ce8
         cs:               30 rfl:            10283 rsp: fffffe00412f5a70
         ss:               38

fffffe00412f5aa0 polldat_associate+0x98(fffffe2e2f89b9e8, fffffe2d9916a128)
fffffe00412f5b90 dp_pcache_poll+0x344(fffffe2dbd9736c8, fffffe2d83ff4a28, fffffe2e3edeb080, 1, fffffe00412f5bfc)
fffffe00412f5cb0 dpioctl+0x4c0(a200000010, d001, fffffc7fffdf25f0, 202003, fffffe2d6af29df8, fffffe00412f5e08)
fffffe00412f5cf0 cdev_ioctl+0x3f(a200000010, d001, fffffc7fffdf25f0, 202003, fffffe2d6af29df8, fffffe00412f5e08)
fffffe00412f5d40 spec_ioctl+0x55(fffffe2ec247bb00, d001, fffffc7fffdf25f0, 202003, fffffe2d6af29df8, fffffe00412f5e08, 0)
fffffe00412f5dd0 fop_ioctl+0x40(fffffe2ec247bb00, d001, fffffc7fffdf25f0, 202003, fffffe2d6af29df8, fffffe00412f5e08, 0)
fffffe00412f5ef0 ioctl+0x144(9, d001, fffffc7fffdf25f0)
fffffe00412f5f00 sys_syscall+0x2ae()

So we died while attempt to associate the polldat for a given pollcache entry with the pollhead we received from the underlying resource. What seems to be wrong with that pollhead pointer?

> fffffe2d9916a128::whatis
fffffe2d9916a128 is fffffe2d9916a110+18, freed from kmem_alloc_112:
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
fffffe2d9801e030 fffffe2d9916a110       3683c6fffa fffffe003e703c20
                 fffffe2cfb442008 fffffe2d1a62c800 fffffe2d4203b468
                 kmem_cache_free_debug+0x114
                 kmem_cache_free+0x89
                 kmem_free+0x58
                 signalfd_wake_task+0x53
                 taskq_thread+0x2ee
                 thread_start+0xb

It belonged to signalfd, but has already been freed. A quick check of the associated process shows that both /dev/poll and signalfd are on the scene:

> fffffe2d6febf3e0::print kthread_t t_procp | ::pfiles ! grep -v REG
FD   TYPE            VNODE INFO
   0  CHR fffffe2dce75d300 /dev/pts/2
   4 FIFO fffffe2dc534d800
   5  CHR fffffe2ec247b700 /devices/pseudo/poll@0:poll
   7 SOCK fffffe2ec2914400 socket: AF_UNIX /tmp/pg_regress-RiqmOg/.s.PGSQL.5678 remote: AF_UNIX
   8  CHR fffffe2ec247ba00 /devices/pseudo/signalfd@0:signalfd
   9  CHR fffffe2ec247bb00 /devices/pseudo/poll@0:poll

How is it that the pollhead for this signalfd instance could have been freed in the middle of a dp_pcache_poll call? Here's the relevant portion of the function:

                        error = VOP_POLL(fp->f_vnode, pdp->pd_events, 0,
                            &revent, &php, NULL);

  ...

                        curthread->t_pollcache = NULL;
                        releasef(fd);
                        if (error != 0) {
                                break;
                        }

...
                        if (php != NULL && pdp->pd_php != NULL &&
                            php != pdp->pd_php) {
                                polldat_disassociate(pdp);
                                polldat_associate(pdp, php);
                                /*
                                 * The bit should still be set.
                                 */
                                ASSERT(BT_TEST(pcp->pc_bitmap, fd));
                                goto retry;
                        }

We VOP_POLL the signalfd resource (which ends up as a call to signalfd_poll, when translated through the VFS/cdev logic), which emits a pollhead_t reference via the php pointer. There is currently no explicit contract about the lifetime of a pollhead emitted from the VOP_POLL handler. For many resources, the lifetime of the pollhead is tied to the vnode itself, whose existence is maintained by the file_t reference. The situation for signalfd is a little more complicated:

static int
signalfd_poll(dev_t dev, short events, int anyyet, short *reventsp,
    struct pollhead **phpp)
{
...
        *reventsp = revents & events;
        if ((*reventsp == 0 && !anyyet) || (events & POLLET)) {
                sigfd_proc_state_t *pstate;
                sigfd_poll_waiter_t *pw;

                /*
                 * Enable pollwakeup handling.
                 */
...

                pw = signalfd_wake_list_add(pstate, state);
                *phpp = &pw->spw_pollhd;
...
}

static sigfd_poll_waiter_t *
signalfd_wake_list_add(sigfd_proc_state_t *pstate, signalfd_state_t *state)
{
        list_t *lst = &pstate->sigfd_list;
        sigfd_poll_waiter_t *pw;

        for (pw = list_head(lst); pw != NULL; pw = list_next(lst, pw)) {
                if (pw->spw_state == state)
                        break;
        }

        if (pw == NULL) {
                pw = kmem_zalloc(sizeof (*pw), KM_SLEEP);

                mutex_enter(&state->sfd_lock);
                signalfd_state_enter_locked(state);
                pw->spw_state = state;
                mutex_exit(&state->sfd_lock);
                list_insert_head(lst, pw);
        }
        return (pw);
}

Because signalfd is an odd case - polling on an instance inherited via fork() should yield signal results only from the current process - it does have a pollhead per vnode, but rather creates one per signalfd-state/process combo. Furthermore, those pollheads (which reside in a sigfd_poll_waiter_t are consumed if/when the process receives a signal:

static void
signalfd_pollwake_cb(void *arg0, int sig)
{
        proc_t *p = (proc_t *)arg0;
        sigfd_proc_state_t *pstate = (sigfd_proc_state_t *)p->p_sigfd;
        list_t *lst;
        sigfd_poll_waiter_t *pw;

        ASSERT(MUTEX_HELD(&p->p_lock));
        ASSERT(pstate != NULL);

        lst = &pstate->sigfd_list;
        pw = list_head(lst);
        while (pw != NULL) {
... 

                /*
                 * Pull the sigfd_poll_waiter_t out of the list and dispatch it
                 * to perform a pollwake.  This cannot be done synchronously
                 * since signalfd_poll and signalfd_pollwake_cb have
                 * conflicting lock orders which can deadlock.
                 */
                next = list_next(lst, pw);
                list_remove(lst, pw);
                taskq_dispatch_ent(signalfd_wakeq, signalfd_wake_task, pw, 0,
                    &pw->spw_taskent);
                pw = next;
        }
}

static void
signalfd_wake_task(void *arg)
{
        sigfd_poll_waiter_t *pw = arg;
        signalfd_state_t *state = pw->spw_state;

        pw->spw_state = NULL;
        signalfd_state_release(state, B_FALSE);
        pollwakeup(&pw->spw_pollhd, pw->spw_pollev);
        pollhead_clean(&pw->spw_pollhd);
        kmem_free(pw, sizeof (*pw));
}

A stale (freed) pollhead could absolutely be the result of a process receiving a signal while polling on a signalfd instance:

1. The polling thread makes enough forward progress to call VOP_POLL on the signalfd resource when no signals are pending, resulting it emitting a pollhead which resides in a sigfd_poll_waiter_t allocated for that signalfd/process instance.
2. After returning from the VOP_POLL call, the polling thread goes off-cpu for some (any) reason
3. A signal is delivered to the polling process, resulting in the signalfd_pollwake_cb call, which queues the sigfd_poll_waiter_t on the taskq for signalfd_wake_task processing
4. The taskq runs signalfd_wake_task, which performs the wake-up and clean on the pollhead, and then frees the sigfd_poll_waiter_t which contains it
5. The polling thread resumes on-cpu with its local php referencing a pollhead in the now-freed sigfd_poll_waiter_t

This behavior of signalfd freeing a pollhead when it is accessible via any thread doing polling in a process is unacceptable, given the current expectations of the poll infrastructure. It should be restructured so that it can ensure those pollheads will remain "live" until:
1. the process exits; or
2. the associated signalfd instance is closed

Furthermore, dp_pcache_poll needs to be updated so it does not make the releasef() call until it is done manipulating the returned php. It is theoretically possible that the underlying vnode would be freed when we release that fd, resulting in the pollhead disappearing as well.


Related issues

Related to illumos gate - Bug #13700: pollhead_delete trips over bad pointerClosed

Actions
Actions #1

Updated by Patrick Mooney about 2 months ago

  • Related to Bug #13700: pollhead_delete trips over bad pointer added
Actions

Also available in: Atom PDF