Bug #3531
closedRace between log_sysevent_filename() and log_event_upcall() can cause panic
Added by Marcel Telka over 9 years ago. Updated almost 7 years ago.
100%
Description
If the log_sysevent_filename() and log_event_upcall() are running concurrently it can cause several issues:
- two threads can call the door_ki_rele(event_door) in parallel - Root Cause 1 (panic)
- the event_door is cleared out before it is referenced - Root Cause 2 (panic)
- the logevent_door_upcall_filename is used after it is freed - Root Cause 3 (use-after-free)
Solution:
Added event_door_mutex into log_sysevent_filename() and log_event_upcall() to prevent the race.
Updated by Marcel Telka over 9 years ago
Root Cause 1:
In case the log_event_deliver() is running and it is trying to deliver events to syseventd and in the same time the syseventd is restarted a race can happen.
When the syseventd_init() is called (from syseventd's main()) after the syseventd restart the syseventd notifies the kernel about the "new" door name here in the usr/src/cmd/syseventd/daemons/syseventd/syseventd.c file:
1493 /* 1494 * Tell kernel the door name and start delivery 1495 */ 1496 syseventd_print(2, 1497 "local_door_file = %s\\n", local_door_file); 1498 if (modctl(MODEVENTS, 1499 (uintptr_t)MODEVENTS_SET_DOOR_UPCALL_FILENAME, 1500 (uintptr_t)local_door_file, NULL, NULL, 0) < 0) { 1501 syseventd_err_print(INIT_DOOR_NAME_ERR, strerror(errno)); 1502 syseventd_exit(6); 1503 }
In kernel this leads to the log_sysevent_filename() call.
If the log_event_deliver() is running in the same time it might call log_event_upcall() here:
361 mutex_exit(&eventq_head_mutex); 362 if ((upcall_err = log_event_upcall(&q->arg)) != 0) { 363 mutex_enter(&eventq_head_mutex); 364 break; 365 }
The log_event_upcall() will try to deliver an event to syseventd here:
250 if ((error = door_ki_upcall_limited(event_door, &darg, NULL, 251 SIZE_MAX, 0)) == 0) { 252 break; 253 }
but since the syseventd just restarted, the attempt will fail with EBADF, so the rebind is attempted here:
269 case EBADF: 270 LOG_DEBUG((CE_CONT, "log_event_upcall: rebinding\\n")); 271 /* Server may have died. Try rebinding */ 272 if ((error = log_event_upcall_lookup()) != 0) { 273 LOG_DEBUG((CE_CONT, 274 "log_event_upcall: lookup error %d\\n", 275 error)); 276 return (EBADF); 277 }
and the log_event_upcall_lookup() is called.
As a result we have both log_event_upcall_lookup() and log_sysevent_filename() called concurrently. Unfortunately, both functions are not prepared for the concurrent call and it is obvious that we can easily have two concurrent door_ki_rele(event_door); in progress. This will lead to this panic:
> ::stack vpanic() assfail+0x7e(fffffffffbbc65a8, fffffffffbbc66b0, 33e) vn_rele+0x7b(ffffff436b744840) closef+0x75(ffffff436b0a90b0) door_ki_rele+0x15(ffffff436b0a90b0) stubs_common_code+0x51() log_sysevent_filename+0x3a(ffffff43506c0d80) modctl_modevents+0xb6(2, febafb60, 0, 0, 0) modctl+0x229(f, 2, febafb60, 0, 0, 0) _sys_sysenter_post_swapgs+0x149() >
... and there is one more thread doing the closef of the same file:
> ffffff01e97e0c60::findstack -v stack pointer for thread ffffff01e97e0c60: ffffff01e97e04c0 ffffff01e97e09d0 0xffffff4310f4d778() ffffff01e97e09f0 vfs_rele+0x55(ffffff436893a010) ffffff01e97e0a30 nm_inactive+0x77(ffffff436b744840, ffffff42e286fe88, 0) ffffff01e97e0a80 fop_inactive+0xaf(ffffff436b744840, ffffff42e286fe88, 0) ffffff01e97e0aa0 vn_rele+0x5f(ffffff436b744840) ffffff01e97e0ae0 closef+0x75(ffffff436b0a90b0) ffffff01e97e0b00 door_ki_rele+0x15(ffffff436b0a90b0) ffffff01e97e0b40 stubs_common_code+0x51() ffffff01e97e0b50 log_event_upcall_lookup+0x15() ffffff01e97e0bf0 log_event_upcall+0x13c(ffffff436bad4148) ffffff01e97e0c40 log_event_deliver+0x8f() ffffff01e97e0c50 thread_start+8() >
The file in question have f_count = -1. This should never happen:
> ffffff436b0a90b0::print file_t f_count f_count = 0xffffffff >
Also, the vnode in question is already freed:
> ffffff436b744840::whatis ffffff436b744840 is freed from vn_cache >
Interestingly, the log_sysevent_filename() contains this comment:
1604 /* 1605 * Called serially by syseventd init code, no need to protect door 1606 * data. 1607 */
This assumption is apparently wrong. Some protection mechanism is needed to make sure we do not call the door_ki_rele(event_door); twice.
Updated by Marcel Telka over 9 years ago
The code in log_sysevent_filename() and log_event_upcall() is fragile. The another panic is possible (I reproduced it):
> ::status debugging crash dump vmcore.0 (64-bit) from telcontar operating system: 5.11 il-sysevent (i86pc) image uuid: 5cfdcdbd-bc35-ec77-8285-f6af3ad9d27f panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff001f61d990 addr=10 occurred in module "doorfs" due to a NULL pointer dereference dump content: kernel pages only > ::stack door_ki_upcall_limited+0x38(0, ffffff001f61db40, 0, ffffffffffffffff, 0) stubs_common_code+0x51() log_event_upcall+0x119(ffffff05012b9508) log_event_deliver+0xae() thread_start+8() > event_door::print 0 >
Root Cause 2:
Assume we have one thread in log_event_upcall() around line 245 here:
237 if ((event_door == NULL) && 238 ((error = log_event_upcall_lookup()) != 0)) { 239 LOG_DEBUG((CE_CONT, 240 "log_event_upcall: event_door error (%d)\\n", error)); 241 242 return (error); 243 } 244 245 LOG_DEBUG1((CE_CONT, "log_event_upcall: 0x%llx\\n", 246 (longlong_t)SE_SEQ((sysevent_t *)&arg->buf))); 247 248 save_arg = darg; 249 for (retry = 0; ; retry++) { 250 if ((error = door_ki_upcall_limited(event_door, &darg, NULL,
If we are restarting the syseventd now, this will be executed:
1601int 1602log_sysevent_filename(char *file) 1603{ 1604 /* 1605 * Called serially by syseventd init code, no need to protect door 1606 * data. 1607 */ 1608 /* Unbind old event door */ 1609 if (logevent_door_upcall_filename) { 1610 kmem_free(logevent_door_upcall_filename, 1611 logevent_door_upcall_filename_size); 1612 if (event_door) { 1613 door_ki_rele(event_door); 1614 event_door = NULL; 1615 } 1616 }
IOW, event_door will be cleared (lines 1612 - 1615). Back to the first thread, the door_ki_upcall_limited() is called with first parameter NULL. This will cause the panic (as we saw above).
Updated by Marcel Telka over 9 years ago
Root Cause 3:
Another race we need to consider during the bugfix:
1602log_sysevent_filename(char *file) 1603{ 1604 /* 1605 * Called serially by syseventd init code, no need to protect door 1606 * data. 1607 */ 1608 /* Unbind old event door */ 1609 if (logevent_door_upcall_filename) { 1610 kmem_free(logevent_door_upcall_filename, 1611 logevent_door_upcall_filename_size); 1612 if (event_door) { 1613 door_ki_rele(event_door); 1614 event_door = NULL; 1615 } 1616 } 1617 logevent_door_upcall_filename_size = strlen(file) + 1; 1618 logevent_door_upcall_filename = kmem_alloc( 1619 logevent_door_upcall_filename_size, KM_SLEEP); 1620 (void) strcpy(logevent_door_upcall_filename, file);
If the thread calling log_sysevent_filename() is around line 1616, the logevent_door_upcall_filename points to a freed memory.
If another concurrent thread is here in log_event_upcall():
237 if ((event_door == NULL) && 238 ((error = log_event_upcall_lookup()) != 0)) { 239 LOG_DEBUG((CE_CONT, 240 "log_event_upcall: event_door error (%d)\\n", error)); 241 242 return (error); 243 }
Then the log_event_upcall_lookup() is called and later the door_ki_open() is called with a pointer to the freed memory as the first parameter:
167 if ((error = 168 door_ki_open(logevent_door_upcall_filename, &event_door)) != 0) { 169 return (error); 170 }
Updated by Marcel Telka over 9 years ago
HOW TO REPRODUCE THE ISSUE
Prerequisites:
To reproduce these issues (on an OpenIndiana box) you need to modify the sysevent kernel subsystem to include some delays. This will increase the race windows. Here is the modification I did:
--- a/usr/src/uts/common/os/log_sysevent.c Thu Jan 17 20:18:59 2013 -0800 +++ b/usr/src/uts/common/os/log_sysevent.c Wed Jan 23 19:09:48 2013 +0100 @@ -146,6 +146,8 @@ static kcondvar_t event_pause_cv; static int event_pause_state = 0; +static volatile int log_WAIT = 0; + /* * log_event_upcall_lookup - Establish door connection with user event * daemon (syseventd) @@ -156,6 +158,7 @@ int error; if (event_door) { /* Release our previous hold (if any) */ + while (log_WAIT) ; door_ki_rele(event_door); } @@ -242,6 +245,8 @@ return (error); } + delay(SEC_TO_TICK(5)); + LOG_DEBUG1((CE_CONT, "log_event_upcall: 0x%llx\\n", (longlong_t)SE_SEQ((sysevent_t *)&arg->buf))); @@ -1610,9 +1615,11 @@ kmem_free(logevent_door_upcall_filename, logevent_door_upcall_filename_size); if (event_door) { + while (log_WAIT == 1) ; door_ki_rele(event_door); event_door = NULL; } + while (log_WAIT) ; } logevent_door_upcall_filename_size = strlen(file) + 1; logevent_door_upcall_filename = kmem_alloc(
You also need this testing program:
root@telcontar:/var/tmp/sysevent# cat sysevent.c #include <libsysevent.h> #include <libnvpair.h> #include <stdio.h> int main(void) { sysevent_id_t eid; int r; r = sysevent_post_event("class", "subclass", "vendor", "publisher", NULL, &eid); if (r == -1) perror("sysevent_post_event"); return 0; } root@telcontar:/var/tmp/sysevent# gcc -l sysevent -o sysevent sysevent.c root@telcontar:/var/tmp/sysevent#
Steps to Reproduce Issue 1:
To reproduce this issue you'd probably need a multicore system.
- Make sure the sysevent subsystem will synchronize as we need:
# echo "log_WAIT/W 1" | mdb -kw log_WAIT: 0 = 0x1 #
- Run this dtrace oneliner in another terminal window:
# dtrace -n 'log_event_upcall: {self->trace = probename == "entry"} delay: /self->trace/{}'
- Post an event:
# /var/tmp/sysevent/sysevent
- Make sure the delay:entry probe was fired (in the other terminal window).
- Once the delay:entry probe is fired, you have 5 seconds to restart syseventd:
# svcadm restart sysevent
- Make sure you have a thread stuck in log_sysevent_filename():
# echo "::stacks -c log_sysevent_filename" | mdb -k THREAD STATE SOBJ COUNT ffffff052605dc20 ONPROC <NONE> 1 log_sysevent_filename+0x3d modctl_modevents+0xb6 modctl+0x229 _sys_sysenter_post_swapgs+0x237 #
- Make sure you have another thread stuck in log_event_upcall_lookup() - you might need to wait up to 5 seconds to get it:
# echo "::stacks -c log_event_upcall_lookup" | mdb -k THREAD STATE SOBJ COUNT ffffff001f528c40 ONPROC <NONE> 1 log_event_upcall_lookup+0x18 log_event_upcall+0x19a log_event_deliver+0xae thread_start+8 #
- Make sure both threads continue synchronously:
# echo "log_WAIT/W 0" | mdb -kw
- PANIC!
Steps to Reproduce Issue 2:
- Run this dtrace oneliner:
# dtrace -n 'log_event_upcall: {self->trace = probename == "entry"} delay: /self->trace/{}'
- In the second terminal window post an event:
# /var/tmp/sysevent/sysevent
- Make sure the delay:entry probe was fired (in the first terminal window).
- Once the delay:entry probe is fired, you have 5 seconds to restart syseventd:
# svcadm restart sysevent
- PANIC!
Steps to Reproduce Issue 3:
- Boot the machine with kmem_flags set to 0xf (in /etc/system):
# echo "set kmem_flags=0xf" >> /etc/system # reboot
- Run the following dtrace script in separate terminal window:
# dtrace -n 'log_event_upcall_lookup:entry {self->t=1} log_event_upcall_lookup:return {self->t=0; trace(arg1)} door_ki_open:entry /self->t/ {trace(stringof(arg0))}'
- Make sure there is zero outstanding sysevents in the queue:
# echo "log_eventq_cnt::print" | mdb -k 0 #
- Make sure the sysevent subsystem will synchronize as we need:
# echo "log_WAIT/W 2" | mdb -kw log_WAIT: 0 = 0x2 #
- Restart syseventd:
# svcadm restart sysevent
- Make sure you have a thread stuck in log_sysevent_filename():
# echo "::stacks -c log_sysevent_filename" | mdb -k THREAD STATE SOBJ COUNT ffffff0526a907c0 ONPROC <NONE> 1 log_sysevent_filename+0x69 modctl_modevents+0xb6 modctl+0x229 _sys_sysenter_post_swapgs+0x237 #
- Post an event:
# /var/tmp/sysevent/sysevent
- The dtrace output confirms that door_ki_open() in log_event_upcall_lookup() accessed freed memory (logevent_door_upcall_filename) and failed:
dtrace: description 'log_event_upcall_lookup:entry ' matched 3 probes CPU ID FUNCTION:NAME 3 45632 door_ki_open:entry 0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef 0: ef be ad de ef be ad de ef be ad de ef be ad de ................ 10: ef be ad de ef be ad de ce fa ed fe ce fa ed fe ................ 20: 98 28 09 1b 05 ff ff ff 76 dc e7 ef 05 ff ff ff .(......v....... 30: 60 18 be 09 05 ff ff ff 00 00 00 00 00 00 00 00 `............... 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 3 18329 log_event_upcall_lookup:return 2
- Let the sysevent subsystem to continue:
# echo "log_WAIT/W 0" | mdb -kw log_WAIT: 0x2 = 0x0 #
Updated by Marcel Telka over 9 years ago
Fix Testing:
For the fix testing I used this event generator (source for the /var/tmp/sysevent/sysevent binary is above):
# while true ; do /var/tmp/sysevent/sysevent ; done
Basically this was used to simulate the kernel->syseventd transport issue (I tested without the sleep too):
# while true ; do svcadm disable sysevent ; echo disabled ; sleep 5 ; svcadm enable sysevent ; echo enabled ; done
To monitor the sysevent queue in kernel during the test I used this:
# while true ; do echo log_eventq_cnt::print | mdb -k ; done
I also used this to see how the door is opened and closed:
# while true ; do echo event_door::print | mdb -k ; done
When there were some events in the queue and the transport was down (syseventd not running) the event_door was cleared. In case the queue was empty the door left opened even the syseventd was not running (nobody tries to use the door when there are no events in the queue). During the syseventd start the event_door was correctly reopened and the event delivery resumed correctly.
I confirmed that the event delivery from kernel to userspace syseventd works as expected. No regression was found.
Updated by Marcel Telka over 9 years ago
- Status changed from In Progress to Pending RTI
Updated by Rich Lowe over 9 years ago
- Category set to kernel
- Status changed from Pending RTI to Resolved
- % Done changed from 20 to 100
- Tags deleted (
needs-triage)
Resolved in e5dc7ea