Bug #11494

Updated by Joshua M. Clulow about 3 years ago

Some fault conditions experienced by user programs are reported via signals, directed to the LWP which induced the fault.    The list of such signals includes at least @SIGSEGV@, @SIGBUS@, @SIGILL@, @SIGTRAP@, and @SIGFPE@.    When delivering a fault signal like this, we populate the @siginfo_t@ with information about the fault address, and potentially the program counter, at the moment of the fault.    In addition, some software (not unreasonably) will inspect the @ucontext_t@ that we provide to the signal handler to collect more detailed information about the state of the program when it was interrupted by the signal.    This is used by language runtimes like the JVM to convert a @SIGSEGV@ into an appropriate @NullPointerException@, amongst other things. 

 When deciding which signal to deliver In @issig_forreal()@, we use @fsig()@ twice: first, to check the thread-directed pending signal set; then, if there are no thread-directed signals, we check the process-directed signal set.    In general, with a couple of exceptions, @fsig()@ starts with the lowest numbered signal (i.e., it will try @SIGBUS@, 10, before @SIGSEGV@, 11). 

 The synchronous fault signals are all thread-directed, so they are given priority over any signals from outside.    We do not, to my knowledge, allow processes to signal an arbitrary thread in another process -- they use @kill(2)@, which only generates _process_-directed signals.    Faulting signals, therefore, are effectively delivered first.    The context we preserve from the interrupted program will reflect the machine state when the fault occurred. 

 Unfortunately, there are conditions under which an exception made in @fsig()@ causes the incorrect behaviour.    If @SIGKILL@ is present in the inspected signal set, it will be prioritised above any other signal.    That makes sense, because if @KILL@ is asserted, the program is over before it gets to any other signal handling anyway.    In addition to @KILL@, though, we give second place to @SIGPROF@ (number 29!) before attending the rest of the signals in ascending order.    When @SIGPROF@ is process-directed, this doesn't matter because it falls in line behind any of the thread-directed fault signals.    When using @setitimer(2)@ with the @ITIMER_PROF@ timer, however, the kernel _sends a thread-directed @SIGPROF@ @SIGPROF@_ in @clock_tick()@ if the LWP is on-CPU when the tick occurs! 

 This thread-directed @SIGPROF@ will jump the queue to begin delivery prior to any fault signal that might have been tripped at around the time that the clock tick occurred.    If the signal mask (@sa_mask@) provided to @sigaction(2)@ for @SIGPROF@ did not mask other signals, as soon as we unmask before calling the user handler, we'll be dragged back into the kernel to deliver the fault signal.    We're now in nested signal handling territory, so the context we provide with that faulting signal reflects an interrupted @libc@ signal handling function, not the user program. 

 A rough sequence of events that leads to the problem: 

 * user program asks for some unmapped memory 
 * page fault handler is triggered, determines the access is invalid, sets up a @SIGSEGV@ for this LWP 
 * @clock_tick()@ sends a thread-directed @SIGPROF@ because we are presently on-CPU 
 * on the way back to user mode we see there are signals so we determine which one to send first; @fsig()@ chooses the @SIGPROF@ 
 * we return to user mode, vectored to the @libc@ signal handling machinery, with all signals blocked and a context that refers to the main program that tripped the fault 
 * @libc@ applies the @sa_mask@ for our @SIGPROF@ handler, unblocking signals by making a @_lwp_sigmask()@ system call 
 * on return from @_lwp_sigmask()@, we see there are signals and @fsig()@ tells us @SIGSEGV@ is up next 
 * we are revectored to the @SIGSEGV@ handler, with a context that refers to @libc@ just after the @_lwp_sigmask()@ call 
 * the user program's @SIGSEGV@ handler flies off the rails because @REG_PC@ doesn't point to the program text that faulted in the first place 

 Note that there is a wholly separate delivery mechanism for use of @setitimer(2)@ with @ITIMER_REALPROF@ which will also generate @SIGPROF@ but this way, and does not appear to exhibit these same symptoms. 

 It seems reasonable _not_ to prioritise @SIGPROF@ before the synchronous fault signals (which are otherwise all lower in number than 29).    Even if @fsig()@ did nothing special with it, it will still be prioritised above signals from other processes by virtue of being thread-directed -- it just won't interfere with the correct delivery of context about @SIGSEGV@, @SIGILL@, etc.