Project

General

Profile

Bug #3531

Race between log_sysevent_filename() and log_event_upcall() can cause panic

Added by Marcel Telka over 7 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
kernel
Start date:
2013-02-06
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

If the log_sysevent_filename() and log_event_upcall() are running concurrently it can cause several issues:
- two threads can call the door_ki_rele(event_door) in parallel - Root Cause 1 (panic)
- the event_door is cleared out before it is referenced - Root Cause 2 (panic)
- the logevent_door_upcall_filename is used after it is freed - Root Cause 3 (use-after-free)

Solution:
Added event_door_mutex into log_sysevent_filename() and log_event_upcall() to prevent the race.

History

#1

Updated by Marcel Telka over 7 years ago

Root Cause 1:

In case the log_event_deliver() is running and it is trying to deliver events to syseventd and in the same time the syseventd is restarted a race can happen.

When the syseventd_init() is called (from syseventd's main()) after the syseventd restart the syseventd notifies the kernel about the "new" door name here in the usr/src/cmd/syseventd/daemons/syseventd/syseventd.c file:

1493    /*
1494     * Tell kernel the door name and start delivery
1495     */
1496    syseventd_print(2,
1497        "local_door_file = %s\\n", local_door_file);
1498    if (modctl(MODEVENTS,
1499        (uintptr_t)MODEVENTS_SET_DOOR_UPCALL_FILENAME,
1500        (uintptr_t)local_door_file, NULL, NULL, 0) < 0) {
1501        syseventd_err_print(INIT_DOOR_NAME_ERR, strerror(errno));
1502        syseventd_exit(6);
1503    }

In kernel this leads to the log_sysevent_filename() call.

If the log_event_deliver() is running in the same time it might call log_event_upcall() here:

361            mutex_exit(&eventq_head_mutex);
362            if ((upcall_err = log_event_upcall(&q->arg)) != 0) {
363                mutex_enter(&eventq_head_mutex);
364                break;
365            }

The log_event_upcall() will try to deliver an event to syseventd here:

250        if ((error = door_ki_upcall_limited(event_door, &darg, NULL,
251            SIZE_MAX, 0)) == 0) {
252            break;
253        }

but since the syseventd just restarted, the attempt will fail with EBADF, so the rebind is attempted here:

269        case EBADF:
270            LOG_DEBUG((CE_CONT, "log_event_upcall: rebinding\\n"));
271            /* Server may have died. Try rebinding */
272            if ((error = log_event_upcall_lookup()) != 0) {
273                LOG_DEBUG((CE_CONT,
274                    "log_event_upcall: lookup error %d\\n",
275                    error));
276                return (EBADF);
277            }

and the log_event_upcall_lookup() is called.

As a result we have both log_event_upcall_lookup() and log_sysevent_filename() called concurrently. Unfortunately, both functions are not prepared for the concurrent call and it is obvious that we can easily have two concurrent door_ki_rele(event_door); in progress. This will lead to this panic:

> ::stack 
vpanic()
assfail+0x7e(fffffffffbbc65a8, fffffffffbbc66b0, 33e)
vn_rele+0x7b(ffffff436b744840)
closef+0x75(ffffff436b0a90b0)
door_ki_rele+0x15(ffffff436b0a90b0)
stubs_common_code+0x51()
log_sysevent_filename+0x3a(ffffff43506c0d80)
modctl_modevents+0xb6(2, febafb60, 0, 0, 0)
modctl+0x229(f, 2, febafb60, 0, 0, 0)
_sys_sysenter_post_swapgs+0x149()
>

... and there is one more thread doing the closef of the same file:

> ffffff01e97e0c60::findstack -v
stack pointer for thread ffffff01e97e0c60: ffffff01e97e04c0
  ffffff01e97e09d0 0xffffff4310f4d778()
  ffffff01e97e09f0 vfs_rele+0x55(ffffff436893a010)
  ffffff01e97e0a30 nm_inactive+0x77(ffffff436b744840, ffffff42e286fe88, 0)
  ffffff01e97e0a80 fop_inactive+0xaf(ffffff436b744840, ffffff42e286fe88, 0)
  ffffff01e97e0aa0 vn_rele+0x5f(ffffff436b744840)
  ffffff01e97e0ae0 closef+0x75(ffffff436b0a90b0)
  ffffff01e97e0b00 door_ki_rele+0x15(ffffff436b0a90b0)
  ffffff01e97e0b40 stubs_common_code+0x51()
  ffffff01e97e0b50 log_event_upcall_lookup+0x15()
  ffffff01e97e0bf0 log_event_upcall+0x13c(ffffff436bad4148)
  ffffff01e97e0c40 log_event_deliver+0x8f()
  ffffff01e97e0c50 thread_start+8()
>

The file in question have f_count = -1. This should never happen:

> ffffff436b0a90b0::print file_t f_count
f_count = 0xffffffff
>

Also, the vnode in question is already freed:

> ffffff436b744840::whatis
ffffff436b744840 is freed from vn_cache
>

Interestingly, the log_sysevent_filename() contains this comment:

1604    /*
1605     * Called serially by syseventd init code, no need to protect door
1606     * data.
1607     */

This assumption is apparently wrong. Some protection mechanism is needed to make sure we do not call the door_ki_rele(event_door); twice.

#2

Updated by Marcel Telka over 7 years ago

The code in log_sysevent_filename() and log_event_upcall() is fragile. The another panic is possible (I reproduced it):

> ::status
debugging crash dump vmcore.0 (64-bit) from telcontar
operating system: 5.11 il-sysevent (i86pc)
image uuid: 5cfdcdbd-bc35-ec77-8285-f6af3ad9d27f
panic message: 
BAD TRAP: type=e (#pf Page fault) rp=ffffff001f61d990 addr=10 occurred in module
 "doorfs" due to a NULL pointer dereference
dump content: kernel pages only
> ::stack
door_ki_upcall_limited+0x38(0, ffffff001f61db40, 0, ffffffffffffffff, 0)
stubs_common_code+0x51()
log_event_upcall+0x119(ffffff05012b9508)
log_event_deliver+0xae()
thread_start+8()
> event_door::print
0
>

Root Cause 2:

Assume we have one thread in log_event_upcall() around line 245 here:

237    if ((event_door == NULL) &&
238        ((error = log_event_upcall_lookup()) != 0)) {
239        LOG_DEBUG((CE_CONT,
240            "log_event_upcall: event_door error (%d)\\n", error));
241
242        return (error);
243    }
244
245    LOG_DEBUG1((CE_CONT, "log_event_upcall: 0x%llx\\n",
246        (longlong_t)SE_SEQ((sysevent_t *)&arg->buf)));
247
248    save_arg = darg;
249    for (retry = 0; ; retry++) {
250        if ((error = door_ki_upcall_limited(event_door, &darg, NULL,

If we are restarting the syseventd now, this will be executed:

1601int
1602log_sysevent_filename(char *file)
1603{
1604    /*
1605     * Called serially by syseventd init code, no need to protect door
1606     * data.
1607     */
1608    /* Unbind old event door */
1609    if (logevent_door_upcall_filename) {
1610        kmem_free(logevent_door_upcall_filename,
1611            logevent_door_upcall_filename_size);
1612        if (event_door) {
1613            door_ki_rele(event_door);
1614            event_door = NULL;
1615        }
1616    }

IOW, event_door will be cleared (lines 1612 - 1615). Back to the first thread, the door_ki_upcall_limited() is called with first parameter NULL. This will cause the panic (as we saw above).

#3

Updated by Marcel Telka over 7 years ago

Root Cause 3:

Another race we need to consider during the bugfix:

1602log_sysevent_filename(char *file)
1603{
1604    /*
1605     * Called serially by syseventd init code, no need to protect door
1606     * data.
1607     */
1608    /* Unbind old event door */
1609    if (logevent_door_upcall_filename) {
1610        kmem_free(logevent_door_upcall_filename,
1611            logevent_door_upcall_filename_size);
1612        if (event_door) {
1613            door_ki_rele(event_door);
1614            event_door = NULL;
1615        }
1616    }
1617    logevent_door_upcall_filename_size = strlen(file) + 1;
1618    logevent_door_upcall_filename = kmem_alloc(
1619        logevent_door_upcall_filename_size, KM_SLEEP);
1620    (void) strcpy(logevent_door_upcall_filename, file);

If the thread calling log_sysevent_filename() is around line 1616, the logevent_door_upcall_filename points to a freed memory.

If another concurrent thread is here in log_event_upcall():

237    if ((event_door == NULL) &&
238        ((error = log_event_upcall_lookup()) != 0)) {
239        LOG_DEBUG((CE_CONT,
240            "log_event_upcall: event_door error (%d)\\n", error));
241
242        return (error);
243    }

Then the log_event_upcall_lookup() is called and later the door_ki_open() is called with a pointer to the freed memory as the first parameter:

167    if ((error =
168        door_ki_open(logevent_door_upcall_filename, &event_door)) != 0) {
169        return (error);
170    }
#4

Updated by Marcel Telka over 7 years ago

HOW TO REPRODUCE THE ISSUE

Prerequisites:

To reproduce these issues (on an OpenIndiana box) you need to modify the sysevent kernel subsystem to include some delays. This will increase the race windows. Here is the modification I did:

--- a/usr/src/uts/common/os/log_sysevent.c    Thu Jan 17 20:18:59 2013 -0800
+++ b/usr/src/uts/common/os/log_sysevent.c    Wed Jan 23 19:09:48 2013 +0100
@@ -146,6 +146,8 @@
 static kcondvar_t event_pause_cv;
 static int event_pause_state = 0;

+static volatile int log_WAIT = 0;
+
 /*
  * log_event_upcall_lookup - Establish door connection with user event
  *                daemon (syseventd)
@@ -156,6 +158,7 @@
     int    error;

     if (event_door) {    /* Release our previous hold (if any) */
+        while (log_WAIT) ;
         door_ki_rele(event_door);
     }

@@ -242,6 +245,8 @@
         return (error);
     }

+    delay(SEC_TO_TICK(5));
+
     LOG_DEBUG1((CE_CONT, "log_event_upcall: 0x%llx\\n",
         (longlong_t)SE_SEQ((sysevent_t *)&arg->buf)));

@@ -1610,9 +1615,11 @@
         kmem_free(logevent_door_upcall_filename,
             logevent_door_upcall_filename_size);
         if (event_door) {
+            while (log_WAIT == 1) ;
             door_ki_rele(event_door);
             event_door = NULL;
         }
+        while (log_WAIT) ;
     }
     logevent_door_upcall_filename_size = strlen(file) + 1;
     logevent_door_upcall_filename = kmem_alloc(

You also need this testing program:

root@telcontar:/var/tmp/sysevent# cat sysevent.c 
#include <libsysevent.h>
#include <libnvpair.h>
#include <stdio.h>

int
main(void)
{
    sysevent_id_t eid;
    int r;

    r = sysevent_post_event("class", "subclass", "vendor", "publisher",
        NULL, &eid);
    if (r == -1)
        perror("sysevent_post_event");

    return 0;
}
root@telcontar:/var/tmp/sysevent# gcc -l sysevent -o sysevent sysevent.c
root@telcontar:/var/tmp/sysevent#

Steps to Reproduce Issue 1:

To reproduce this issue you'd probably need a multicore system.

  1. Make sure the sysevent subsystem will synchronize as we need:
    # echo "log_WAIT/W 1" | mdb -kw
    log_WAIT:       0               =       0x1
    #
    
  2. Run this dtrace oneliner in another terminal window:
    # dtrace -n 'log_event_upcall: {self->trace = probename == "entry"}
    delay: /self->trace/{}'
    
  3. Post an event:
    # /var/tmp/sysevent/sysevent
    
  4. Make sure the delay:entry probe was fired (in the other terminal window).
  5. Once the delay:entry probe is fired, you have 5 seconds to restart syseventd:
    # svcadm restart sysevent
    
  6. Make sure you have a thread stuck in log_sysevent_filename():
    # echo "::stacks -c log_sysevent_filename" | mdb -k
    THREAD           STATE    SOBJ                COUNT
    ffffff052605dc20 ONPROC   <NONE>                  1
                     log_sysevent_filename+0x3d
                     modctl_modevents+0xb6
                     modctl+0x229
                     _sys_sysenter_post_swapgs+0x237
    
    #
    
  7. Make sure you have another thread stuck in log_event_upcall_lookup() - you might need to wait up to 5 seconds to get it:
    # echo "::stacks -c log_event_upcall_lookup" | mdb -k
    THREAD           STATE    SOBJ                COUNT
    ffffff001f528c40 ONPROC   <NONE>                  1
                     log_event_upcall_lookup+0x18
                     log_event_upcall+0x19a
                     log_event_deliver+0xae
                     thread_start+8
    
    #
    
  8. Make sure both threads continue synchronously:
    # echo "log_WAIT/W 0" | mdb -kw
    
  9. PANIC!

Steps to Reproduce Issue 2:

  1. Run this dtrace oneliner:
    # dtrace -n 'log_event_upcall: {self->trace = probename == "entry"}
    delay: /self->trace/{}'
    
  2. In the second terminal window post an event:
    # /var/tmp/sysevent/sysevent
    
  3. Make sure the delay:entry probe was fired (in the first terminal window).
  4. Once the delay:entry probe is fired, you have 5 seconds to restart syseventd:
    # svcadm restart sysevent
    
  5. PANIC!

Steps to Reproduce Issue 3:

  1. Boot the machine with kmem_flags set to 0xf (in /etc/system):
    # echo "set kmem_flags=0xf" >> /etc/system
    # reboot
    
  2. Run the following dtrace script in separate terminal window:
    # dtrace -n 'log_event_upcall_lookup:entry {self->t=1}
    log_event_upcall_lookup:return {self->t=0; trace(arg1)}
    door_ki_open:entry /self->t/ {trace(stringof(arg0))}'
    
  3. Make sure there is zero outstanding sysevents in the queue:
    # echo "log_eventq_cnt::print" | mdb -k
    0
    #
    
  4. Make sure the sysevent subsystem will synchronize as we need:
    # echo "log_WAIT/W 2" | mdb -kw
    log_WAIT:       0               =       0x2
    #
    
  5. Restart syseventd:
    # svcadm restart sysevent
    
  6. Make sure you have a thread stuck in log_sysevent_filename():
    # echo "::stacks -c log_sysevent_filename" | mdb -k
    THREAD           STATE    SOBJ                COUNT
    ffffff0526a907c0 ONPROC   <NONE>                  1
                     log_sysevent_filename+0x69
                     modctl_modevents+0xb6
                     modctl+0x229
                     _sys_sysenter_post_swapgs+0x237
    
    #
    
  7. Post an event:
    # /var/tmp/sysevent/sysevent
    
  8. The dtrace output confirms that door_ki_open() in log_event_upcall_lookup() accessed freed memory (logevent_door_upcall_filename) and failed:
    dtrace: description 'log_event_upcall_lookup:entry ' matched 3 probes
    CPU     ID                    FUNCTION:NAME
      3  45632               door_ki_open:entry 
                 0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
             0: ef be ad de ef be ad de ef be ad de ef be ad de  ................
            10: ef be ad de ef be ad de ce fa ed fe ce fa ed fe  ................
            20: 98 28 09 1b 05 ff ff ff 76 dc e7 ef 05 ff ff ff  .(......v.......
            30: 60 18 be 09 05 ff ff ff 00 00 00 00 00 00 00 00  `...............
            40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    
      3  18329   log_event_upcall_lookup:return                 2
    
  9. Let the sysevent subsystem to continue:
    # echo "log_WAIT/W 0" | mdb -kw
    log_WAIT:       0x2             =       0x0
    #
    
#5

Updated by Marcel Telka over 7 years ago

Fix Testing:

For the fix testing I used this event generator (source for the /var/tmp/sysevent/sysevent binary is above):

# while true ; do /var/tmp/sysevent/sysevent ; done

Basically this was used to simulate the kernel->syseventd transport issue (I tested without the sleep too):

# while true ; do svcadm disable sysevent ; echo disabled ; sleep 5 ; svcadm enable sysevent ; echo enabled ; done

To monitor the sysevent queue in kernel during the test I used this:

# while true ; do echo log_eventq_cnt::print | mdb -k ; done

I also used this to see how the door is opened and closed:

# while true ; do echo event_door::print | mdb -k ; done

When there were some events in the queue and the transport was down (syseventd not running) the event_door was cleared. In case the queue was empty the door left opened even the syseventd was not running (nobody tries to use the door when there are no events in the queue). During the syseventd start the event_door was correctly reopened and the event delivery resumed correctly.

I confirmed that the event delivery from kernel to userspace syseventd works as expected. No regression was found.

#6

Updated by Marcel Telka over 7 years ago

  • % Done changed from 0 to 20
#7

Updated by Marcel Telka over 7 years ago

  • Status changed from In Progress to Pending RTI
#8

Updated by Rich Lowe over 7 years ago

  • Category set to kernel
  • Status changed from Pending RTI to Resolved
  • % Done changed from 20 to 100
  • Tags deleted (needs-triage)

Resolved in e5dc7ea

#9

Updated by Marcel Telka almost 5 years ago

  • Description updated (diff)

Also available in: Atom PDF