Bug #14892


pollhead lifetime too short in signalfd

Added by Patrick Mooney over 1 year ago. Updated about 1 year ago.

Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:
External Bug:


While doing further testing of #13700, andyf was able to reproduce the original issue (using postgres) on bits built with the 13700 change: It turns out it fixed a problem (or rather a few), but not the problem. Let's take a look at the dump he was able to collect:

BAD TRAP: type=d (#gp General protection) rp=fffffe00412f5980 addr=fffffe2e2f89b9e8

#gp General protection
pid=113569, pc=0xfffffffffbdc2ce8, sp=0xfffffe00412f5a70, eflags=0x10283
cr0: 80050033<pg,wp,ne,et,mp,pe>  cr4: 1406f8<smep,osxsav,xmme,fxsr,pge,mce,pae,pse,de>
cr2: fffffc7fdcb29000
cr3: 6b552a000
cr8: 0

        rdi: fffffffffbfb23f0 rsi:                0 rdx: fffffe2d6febf3e0
        rcx:                2  r8:       7d49a90d10  r9:                1
        rax:                0 rbx: deadbeefdeadbeef rbp: fffffe00412f5aa0
        r10: fffffffffb88524c r11: fffffe2d6febf3e0 r12: fffffe2e2f89b9e8
        r13: fffffe2d9916a128 r14: fffffffffbfb23f0 r15: fffffe2e2f89b9e8
        fsb: fffffc7fef022a40 gsb: fffffffffbc40000  ds:               4b
         es:               4b  fs:                0  gs:                0
        trp:                d err:                0 rip: fffffffffbdc2ce8
         cs:               30 rfl:            10283 rsp: fffffe00412f5a70
         ss:               38

fffffe00412f5aa0 polldat_associate+0x98(fffffe2e2f89b9e8, fffffe2d9916a128)
fffffe00412f5b90 dp_pcache_poll+0x344(fffffe2dbd9736c8, fffffe2d83ff4a28, fffffe2e3edeb080, 1, fffffe00412f5bfc)
fffffe00412f5cb0 dpioctl+0x4c0(a200000010, d001, fffffc7fffdf25f0, 202003, fffffe2d6af29df8, fffffe00412f5e08)
fffffe00412f5cf0 cdev_ioctl+0x3f(a200000010, d001, fffffc7fffdf25f0, 202003, fffffe2d6af29df8, fffffe00412f5e08)
fffffe00412f5d40 spec_ioctl+0x55(fffffe2ec247bb00, d001, fffffc7fffdf25f0, 202003, fffffe2d6af29df8, fffffe00412f5e08, 0)
fffffe00412f5dd0 fop_ioctl+0x40(fffffe2ec247bb00, d001, fffffc7fffdf25f0, 202003, fffffe2d6af29df8, fffffe00412f5e08, 0)
fffffe00412f5ef0 ioctl+0x144(9, d001, fffffc7fffdf25f0)
fffffe00412f5f00 sys_syscall+0x2ae()

So we died while attempt to associate the polldat for a given pollcache entry with the pollhead we received from the underlying resource. What seems to be wrong with that pollhead pointer?

> fffffe2d9916a128::whatis
fffffe2d9916a128 is fffffe2d9916a110+18, freed from kmem_alloc_112:
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
fffffe2d9801e030 fffffe2d9916a110       3683c6fffa fffffe003e703c20
                 fffffe2cfb442008 fffffe2d1a62c800 fffffe2d4203b468

It belonged to signalfd, but has already been freed. A quick check of the associated process shows that both /dev/poll and signalfd are on the scene:

> fffffe2d6febf3e0::print kthread_t t_procp | ::pfiles ! grep -v REG
FD   TYPE            VNODE INFO
   0  CHR fffffe2dce75d300 /dev/pts/2
   4 FIFO fffffe2dc534d800
   5  CHR fffffe2ec247b700 /devices/pseudo/poll@0:poll
   7 SOCK fffffe2ec2914400 socket: AF_UNIX /tmp/pg_regress-RiqmOg/.s.PGSQL.5678 remote: AF_UNIX
   8  CHR fffffe2ec247ba00 /devices/pseudo/signalfd@0:signalfd
   9  CHR fffffe2ec247bb00 /devices/pseudo/poll@0:poll

How is it that the pollhead for this signalfd instance could have been freed in the middle of a dp_pcache_poll call? Here's the relevant portion of the function:

                        error = VOP_POLL(fp->f_vnode, pdp->pd_events, 0,
                            &revent, &php, NULL);


                        curthread->t_pollcache = NULL;
                        if (error != 0) {

                        if (php != NULL && pdp->pd_php != NULL &&
                            php != pdp->pd_php) {
                                polldat_associate(pdp, php);
                                 * The bit should still be set.
                                ASSERT(BT_TEST(pcp->pc_bitmap, fd));
                                goto retry;

We VOP_POLL the signalfd resource (which ends up as a call to signalfd_poll, when translated through the VFS/cdev logic), which emits a pollhead_t reference via the php pointer. There is currently no explicit contract about the lifetime of a pollhead emitted from the VOP_POLL handler. For many resources, the lifetime of the pollhead is tied to the vnode itself, whose existence is maintained by the file_t reference. The situation for signalfd is a little more complicated:

static int
signalfd_poll(dev_t dev, short events, int anyyet, short *reventsp,
    struct pollhead **phpp)
        *reventsp = revents & events;
        if ((*reventsp == 0 && !anyyet) || (events & POLLET)) {
                sigfd_proc_state_t *pstate;
                sigfd_poll_waiter_t *pw;

                 * Enable pollwakeup handling.

                pw = signalfd_wake_list_add(pstate, state);
                *phpp = &pw->spw_pollhd;

static sigfd_poll_waiter_t *
signalfd_wake_list_add(sigfd_proc_state_t *pstate, signalfd_state_t *state)
        list_t *lst = &pstate->sigfd_list;
        sigfd_poll_waiter_t *pw;

        for (pw = list_head(lst); pw != NULL; pw = list_next(lst, pw)) {
                if (pw->spw_state == state)

        if (pw == NULL) {
                pw = kmem_zalloc(sizeof (*pw), KM_SLEEP);

                pw->spw_state = state;
                list_insert_head(lst, pw);
        return (pw);

Because signalfd is an odd case - polling on an instance inherited via fork() should yield signal results only from the current process - it does have a pollhead per vnode, but rather creates one per signalfd-state/process combo. Furthermore, those pollheads (which reside in a sigfd_poll_waiter_t are consumed if/when the process receives a signal:

static void
signalfd_pollwake_cb(void *arg0, int sig)
        proc_t *p = (proc_t *)arg0;
        sigfd_proc_state_t *pstate = (sigfd_proc_state_t *)p->p_sigfd;
        list_t *lst;
        sigfd_poll_waiter_t *pw;

        ASSERT(pstate != NULL);

        lst = &pstate->sigfd_list;
        pw = list_head(lst);
        while (pw != NULL) {

                 * Pull the sigfd_poll_waiter_t out of the list and dispatch it
                 * to perform a pollwake.  This cannot be done synchronously
                 * since signalfd_poll and signalfd_pollwake_cb have
                 * conflicting lock orders which can deadlock.
                next = list_next(lst, pw);
                list_remove(lst, pw);
                taskq_dispatch_ent(signalfd_wakeq, signalfd_wake_task, pw, 0,
                pw = next;

static void
signalfd_wake_task(void *arg)
        sigfd_poll_waiter_t *pw = arg;
        signalfd_state_t *state = pw->spw_state;

        pw->spw_state = NULL;
        signalfd_state_release(state, B_FALSE);
        pollwakeup(&pw->spw_pollhd, pw->spw_pollev);
        kmem_free(pw, sizeof (*pw));

A stale (freed) pollhead could absolutely be the result of a process receiving a signal while polling on a signalfd instance:

1. The polling thread makes enough forward progress to call VOP_POLL on the signalfd resource when no signals are pending, resulting it emitting a pollhead which resides in a sigfd_poll_waiter_t allocated for that signalfd/process instance.
2. After returning from the VOP_POLL call, the polling thread goes off-cpu for some (any) reason
3. A signal is delivered to the polling process, resulting in the signalfd_pollwake_cb call, which queues the sigfd_poll_waiter_t on the taskq for signalfd_wake_task processing
4. The taskq runs signalfd_wake_task, which performs the wake-up and clean on the pollhead, and then frees the sigfd_poll_waiter_t which contains it
5. The polling thread resumes on-cpu with its local php referencing a pollhead in the now-freed sigfd_poll_waiter_t

This behavior of signalfd freeing a pollhead when it is accessible via any thread doing polling in a process is unacceptable, given the current expectations of the poll infrastructure. It should be restructured so that it can ensure those pollheads will remain "live" until:
1. the process exits; or
2. the associated signalfd instance is closed

Furthermore, dp_pcache_poll needs to be updated so it does not make the releasef() call until it is done manipulating the returned php. It is theoretically possible that the underlying vnode would be freed when we release that fd, resulting in the pollhead disappearing as well.

Related issues

Related to illumos gate - Bug #13700: pollhead_delete trips over bad pointerClosed

Related to illumos gate - Bug #15386: signalfd overzealous with clean-upClosedPatrick Mooney

Actions #1

Updated by Patrick Mooney over 1 year ago

  • Related to Bug #13700: pollhead_delete trips over bad pointer added
Actions #2

Updated by Electric Monk about 1 year ago

  • Gerrit CR set to 2618
Actions #3

Updated by Dan McDonald about 1 year ago

  • % Done changed from 0 to 80
Actions #4

Updated by Dan McDonald about 1 year ago

A DEBUG kernel is very helpful in reproducing this bug at will, which I have done.

Actions #5

Updated by Dan McDonald about 1 year ago

  • % Done changed from 80 to 60
Actions #6

Updated by Dan McDonald about 1 year ago

Testing done:

- Andy's Postgres build now works without kernel panics, even after 26 straight runs.

- I have tested a devpoll + signalfd looper that, in the presence of a machine-gun `pkill` shell script, holds up with this fix, where it did not before.

- Both of the above concurrently on the same OmniOS VMware VM.

Actions #7

Updated by Dan McDonald about 1 year ago

Additional testing details provided by a Postgres-running customer who had seen problems akin to Andy's after a PG upgrade. Edited-for-clarity-and-confidentiality exchange provided here:

13:30 CUSTOMER: 18:29:57  up 1 day  2:22,  1 user,  load average: 0.83, 0.80, 0.87
14:23 danmcd: And please confirm that before-this-fix the updated, signalfd-using PG would panic this machine in a matter of {minutes, hours}?
14:25 CUSTOMER: Im not 100% sure about the before the fix how often it would pop off
14:26 CUSTOMER: Think it may have been related to workload / vacuuming
14:26 CUSTOMER: but the last few times it happened rather quickly
Actions #8

Updated by Patrick Mooney about 1 year ago

I ran the `os-tests` suite on a machine with the updated bits and they passed (save for some coredump related issues which are known on my particular install).

Actions #9

Updated by Electric Monk about 1 year ago

  • Status changed from In Progress to Closed
  • % Done changed from 60 to 100

git commit 374858d291554c199353841e2900bc130463934a

commit  374858d291554c199353841e2900bc130463934a
Author: Patrick Mooney <>
Date:   2023-02-10T22:42:17.000Z

    14892 pollhead lifetime too short in signalfd
    Portions contributed by: Dan McDonald <>
    Reviewed by: Dan McDonald <>
    Reviewed by: Gordon Ross <>
    Approved by: Robert Mustacchi <>

Actions #10

Updated by Patrick Mooney about 1 year ago

  • Related to Bug #15386: signalfd overzealous with clean-up added

Also available in: Atom PDF