Bug #3917


panic in turnstile_block() on unowned mutex

Added by Robert Mustacchi almost 9 years ago. Updated almost 9 years ago.

Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


We died in turnstile_block(), when we discovered that the mutex we were blocking on was, in fact, unheld:

> ::status
debugging crash dump vmcore.0 (64-bit) from C103-N11
operating system: 5.11 joyent_20130418T230855Z (i86pc)
image uuid: (not set)
panic message: turnstile_block(ffffff21e39d40a0): unowned mutex
dump content: kernel pages only

This should be impossible (hence the panic) because we're coming from mutex_vector_enter():

> ::stack
turnstile_block+0x782(0, 0, ffffff21f7817a58, fffffffffbc07f28, 0, 0)
cv_wait+0x69(ffffff21f7817a68, ffffff21f7817a58)
taskq_thread_wait+0x84(ffffff21f7817a38, ffffff21f7817a58, ffffff21f7817a68, ffffff00f551cae0,

In this code path, we have the turnstile lock, we have set the waiters bit (which has reached global visibility) and we have determined that the mutex owner is not running. The only way the mutex can be dropped at that point is via mutex_vector_exit() – which will need the same turnstile that we have locked. As noted in the Big Theory statement, the mutex_exit() code is lockless – which can only work due to it being special-cased with respect to preemption: if a thread is preempted after the load but before the test, this will be detected when the thread is put back on CPU, and it will be rewound. As the Big Theory statement tells us:

 * Big Theory Statement for mutual exclusion locking primitives.
 * ...
 * The logic for synchronizing mutex_vector_enter() with mutex_exit()
 * in the face of preemption and relaxed memory ordering is as follows:
 * ...
 * The only requirements of code outside the mutex implementation are
 * (1) mutex_exit() preemption fixup in interrupt handlers or trap return,
 * (2) a membar #StoreLoad after setting CPU_THREAD in resume(),
 * (3) mutex_owner_running() preemption fixup in interrupt handlers
 * or trap returns.
 * Note: idle threads cannot grab adaptive locks (since they cannot block),
 * so the membar may be safely omitted when resuming an idle thread.
 * ..

The mutex_exit() and mutex_owner_running() checks are sound – but x86 has no (and seemingly, has never had any) store barrier in resume().

The race, then, involves the owner coming on CPU shortly before a mutex_exit(): because of the lack of a store barrier in resume(), the owner's store to CPU_THREAD reached global visibility after the owner did the load of the lock (which includes the waiters bit) in mutex_exit(). In the time-honored tradition of ASCII art to graphically describe race conditions, here is our race (with arrows denoting temporal happens-before relationships):

CPU A: Running owner thread             CPU B: Running waiting thread
+------------------------------------+  +------------------------------------+
|                                    |  |                                    |
| Owner thread is not running        |  |                                    |
|                 +--------------------------------------->                  |
| In resume() of owner thread, sets  |  | In mutex_vector_enter(), sees      |
| CPU_THREAD                         |  | owner as not running               |
|                                    |  |                                    |
| Owner thread calls mutex_exit()    |  |                                    |
|                                    |  |                                    |
| Owner thread loads mutex value     |  | Sets waiters bit                   |
|                 +--------------------------------------->                  |
| Owner thread checks waiters bit    |  | Waiters bit hits global visibility |
|                                    |  |                                    |
| Seeing no waiters, clears mutex    |  | Re-checks CPU_THREAD; owner still  |
|                                    |  | appears to not be running          |
|                                    |  |                                    |
|                                    |  | Re-checks mutex; lock still        |
|                                    |  | appears to be held                 |
|                 <---------------------------------------+                  |
| Clear of mutex hits global         |  | Waiting thread calls into          |
| visibility                         |  | turnstile_block()                  |
|                 +--------------------------------------->                  |
|                                    |  |                                    |
| Store of CPU_THREAD hits global    |  | Seeing no owner, panics            |
| visibility                         |  |                                    |
|                                    |  |                                    |

As should be clear from the above, the race is extraordinarily tight – which explains why we have not seen this before (or since). The fix here is straightforward: we need to obey the Big Theory Statement. There are two ways to do this: one is to have the store of CPU_THREAD be a serializing instruction; the other is to follow the store with an mfence. For our purposes here, the only reasonable serializing instruction for the store of CPU_THREAD would be xchg. In times and microprocessors long past (and specifically, on Opteron), mfence carried with it substantial cycle penalties, and xchg could be preferred in some circumstances. However, mfence is quite a bit clearer; given that its execution penalty is currently essentially nil, it should be the preferred solution here. (It also bears mentioning that this code path is necessarily heavyweight – we're scheduling another thread to run – and any small cycle penalty of mfence wrt xchg is likely unobservable.)


Also available in: Atom PDF