Bug #7912
closednfs_rwlock readers are running wild waiting for a writer that cannot come
100%
Description
We have 5 threads trying to acquire a nfs_rwlock as a reader:
> ffffff00872e8c40::findstack -v stack pointer for thread ffffff00872e8c40: ffffff00872e8aa0 [ ffffff00872e8aa0 _resume_from_idle+0xf4() ] ffffff00872e8ad0 swtch+0x141() ffffff00872e8b10 cv_wait+0x70(ffffff130c2c0c90, ffffff130c2c0c88) ffffff00872e8b70 nfs_rw_enter_sig+0x1b6(ffffff130c2c0c78, 1, 0) ffffff00872e8c20 nfs4delegreturn_thread+0x159(ffffff130dc9b088) ffffff00872e8c30 thread_start+8() > ffffff0085e36c40::findstack -v stack pointer for thread ffffff0085e36c40: ffffff0085e36aa0 [ ffffff0085e36aa0 _resume_from_idle+0xf4() ] ffffff0085e36ad0 swtch+0x141() ffffff0085e36b10 cv_wait+0x70(ffffff130c2c0c90, ffffff130c2c0c88) ffffff0085e36b70 nfs_rw_enter_sig+0x1b6(ffffff130c2c0c78, 1, 0) ffffff0085e36c20 nfs4delegreturn_thread+0x159(ffffff131f2ff4a0) ffffff0085e36c30 thread_start+8() > ffffff0095754c40::findstack -v stack pointer for thread ffffff0095754c40: ffffff0095754aa0 [ ffffff0095754aa0 _resume_from_idle+0xf4() ] ffffff0095754ad0 swtch+0x141() ffffff0095754b10 cv_wait+0x70(ffffff130c2c0c90, ffffff130c2c0c88) ffffff0095754b70 nfs_rw_enter_sig+0x1b6(ffffff130c2c0c78, 1, 0) ffffff0095754c20 nfs4delegreturn_thread+0x159(ffffff1311feab38) ffffff0095754c30 thread_start+8() > ffffff00957fcc40::findstack -v stack pointer for thread ffffff00957fcc40: ffffff00957fcaa0 [ ffffff00957fcaa0 _resume_from_idle+0xf4() ] ffffff00957fcad0 swtch+0x141() ffffff00957fcb10 cv_wait+0x70(ffffff130c2c0c90, ffffff130c2c0c88) ffffff00957fcb70 nfs_rw_enter_sig+0x1b6(ffffff130c2c0c78, 1, 0) ffffff00957fcc20 nfs4delegreturn_thread+0x159(ffffff1301f311e0) ffffff00957fcc30 thread_start+8() > ffffff0083532c40::findstack -v stack pointer for thread ffffff0083532c40: ffffff0083532aa0 [ ffffff0083532aa0 _resume_from_idle+0xf4() ] ffffff0083532ad0 swtch+0x141() ffffff0083532b10 cv_wait+0x70(ffffff130c2c0c90, ffffff130c2c0c88) ffffff0083532b70 nfs_rw_enter_sig+0x1b6(ffffff130c2c0c78, 1, 0) ffffff0083532c20 nfs4delegreturn_thread+0x159(ffffff195381be88) ffffff0083532c30 thread_start+8() >
and one more thread trying to acquire the same lock as a writer:
> ffffff12ebc2f080::findstack -v stack pointer for thread ffffff12ebc2f080: ffffff009e6acc40 [ ffffff009e6acc40 _resume_from_idle+0xf4() ] ffffff009e6acc70 swtch+0x141() ffffff009e6accb0 cv_wait+0x70(ffffff130c2c0c90, ffffff130c2c0c88) ffffff009e6acd10 nfs_rw_enter_sig+0xd0(ffffff130c2c0c78, 0, 0) ffffff009e6acd60 nfs4_rwlock+0x52(ffffff1320d95880, 1, 0) ffffff009e6acdc0 fop_rwlock+0x2d(ffffff1320d95880, 1, 0) ffffff009e6ace90 write+0x1ae(f, 865ac30, 800) ffffff009e6acec0 write32+0x1e(f, 865ac30, 800) ffffff009e6acf10 _sys_sysenter_post_swapgs+0x149() >
The reader threads are spinning, while the writer thread is sleeping (output from THREADPTR ::thread -mdi
commands):
ADDR STATE FLG PFLG SFLG PRI EPRI PIL INTR DISPTIME BOUND PR SWITCH ffffff00872e8c40 sleep 8 2000 3 60 0 0 n/a b0a181 -1 0 t-0 ffffff0085e36c40 sleep 8 2000 3 60 0 0 n/a b0a52a -1 0 t-0 ffffff0095754c40 sleep 8 2000 3 60 0 0 n/a b0aa32 -1 0 t-0 ffffff00957fcc40 sleep 8 2000 3 60 0 0 n/a b0adf5 -1 0 t-0 ffffff0083532c40 onproc 8 2000 13 60 0 0 n/a b0b1b4 -1 0 t-0 ffffff12ebc2f080 sleep 1000 104 3 59 0 0 n/a 130dcd -1 0 t-10332220
Related issues
Updated by Marcel Telka over 6 years ago
- Related to Bug #7601: nfs_rwlock_t does not scale with cv_broadcast() added
Updated by Marcel Telka over 6 years ago
Root cause
Fix for #7601 assumed that cv_wait(9f)
/cv_wait_sig(9f)
places the calling thread at the end of the waiting queue while cv_signal(9f)
always wakes up the first waiting thread in the queue. The implementation is not so straightforward and thread priorities are considered too. Threads with higher priority are queued before lower priority threads even they arrived after the lower priority threads.
In the example above (in the Description) the waiting writer have priority 59, while all readers have priority 60. Once a reader is woken up, it realizes there is a writer waiting, so it calls cv_signal(9f)
to wake up the next thread and goes back to sleep by calling cv_wait(9f)
. The thread is queued before the writer with the lower priority. The woken up reader is going back to sleep because the nfs_rwlock
implementation prefers writers over readers. The next woken up thread is again a reader with priority 60 and does the same: finds there is a writer sleeping, calls cv_signal(9f)
, and goes back to sleep by calling cv_wait(9f)
, and again, it is queued before the waiting writer. And so on, and so on.
Updated by Gordon Ross over 6 years ago
Ah, so we have a classic "priority inversion" problem?
The usual solution (taught in many CS courses on the subject) is to temporarily elevate the priority of the resource holder to the same priority as the highest priority blocked thread. Can we do that here?
Updated by Marcel Telka over 6 years ago
Gordon Ross wrote:
Ah, so we have a classic "priority inversion" problem?
Partially. Partially the problem is that nfs_rwlock prefers writers over readers.
The usual solution (taught in many CS courses on the subject) is to temporarily elevate the priority of the resource holder to the same priority as the highest priority blocked thread. Can we do that here?
Maybe, but I'm not sure there is an easy implementation of that possible. I'm working on something simple, tailored especially for the nfs_rwlock's needs.
Updated by Marcel Telka over 6 years ago
Fix description
The fix introduces new separate condition variable for waiting readers. In a case a new waiter(s) needs to be woken up either a waiting writer is signalled (via the cv
condition variable) or all waiting readers are broadcasted (via the newly introduced cv_rd
condition variable).
Updated by Marcel Telka about 6 years ago
- Status changed from In Progress to Pending RTI
Updated by Electric Monk about 6 years ago
- Status changed from Pending RTI to Closed
- % Done changed from 0 to 100
git commit 7909625fdb7ecb20e9b7a777cfc0ec7ee63b4642
commit 7909625fdb7ecb20e9b7a777cfc0ec7ee63b4642 Author: Marcel Telka <marcel@telka.sk> Date: 2017-03-22T21:08:11.000Z 7912 nfs_rwlock readers are running wild waiting for a writer that cannot come Reviewed by: Arne Jansen <arne@die-jansens.de> Reviewed by: Gordon Ross <Gordon.W.Ross@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com>