svc_queueclose() could wake up too much threads
In some cases we could have a lot of threads (hundreds) sleeping in svc_poll() and waiting for new work at the pool->p_req_cv condition variable. Here:
2033 /* 2034 * Count us as asleep, mark this thread as safe 2035 * for suspend and wait for a request. 2036 */ 2037 pool->p_asleep++; 2038 timeleft = cv_reltimedwait_sig(&pool->p_req_cv, 2039 &pool->p_req_lock, pool->p_timeout, TR_CLOCK_TICK);
This could easily happen, for example when you have hundreds clients calling NFS operations serialized, with short delays between the NFS calls. This is even easier to observe with very fast NFS operations, like NFS NULL.
In this scenario, when one client disconnects and svc_queueclose() is called, we might end up waking up all those sleeping threads:
2339 /* 2340 * Wakeup threads sleeping in svc_poll() so that they 2341 * unlink from the transport 2342 */ 2343 mutex_enter(&xprt->xp_pool->p_req_lock); 2344 cv_broadcast(&xprt->xp_pool->p_req_cv); 2345 mutex_exit(&xprt->xp_pool->p_req_lock);
We are waking up all sleeping threads, because we have no other way how to wake the only thread (or few threads) attached to the transport we are just closing.
We are waking up threads for one reason only: so they can disconnect from the transport and the transport could be properly closed and destroyed.
The problem with such waking up is that several hundreds of threads are woken up just so they find they have nothing to do (because they are not attached to the closing transport). After that they immediately go back to sleep.
It is well known that waking up many threads for no reason using cv_broadcast() is causing huge performance penalty and it should be avoided when possible.
We saw this issue at a customer (please see attached customer.svg file with the flame graph) where we see that more than 22% of the time is spent spinning in cv_reltimedwait_sig().
The problem is also visible in the crash dump file we have from the customer. There is a lot of runnable threads (waiting for CPU) in cv_reltimedwait_sig() and few others just running on CPU (plus 50 threads waiting in cv_reltimedwait_sig(); these are the threads just recently returned back to sleep):
> ::stacks -m rpcmod -c cv_reltimedwait_sig THREAD STATE SOBJ COUNT ffffff439b1c3080 RUN <NONE> 283 swtch+0x141 cv_timedwait_sig_hires+0x39d cv_reltimedwait_sig+0x4f svc_poll+0x1f2 svc_run+0x16e svc_do_run+0x8e nfssys+0xf1 dtrace_systrace_syscall32+0xe4 _sys_sysenter_post_swapgs+0x149 ffffff439dc680c0 SLEEP CV 50 swtch+0x141 cv_timedwait_sig_hires+0x39d cv_reltimedwait_sig+0x4f svc_poll+0x1f2 svc_run+0x16e svc_do_run+0x8e nfssys+0xf1 dtrace_systrace_syscall32+0xe4 _sys_sysenter_post_swapgs+0x149 ffffff439f9c5060 ONPROC <NONE> 1 0 default_lock_delay+0x3f mutex_vector_enter+0xcc cv_timedwait_sig_hires+0x3c9 cv_reltimedwait_sig+0x4f svc_poll+0x1f2 svc_run+0x16e svc_do_run+0x8e nfssys+0xf1 dtrace_systrace_syscall32+0xe4 _sys_sysenter_post_swapgs+0x149 ffffff439e0fd500 ONPROC <NONE> 1 0xb _resume_from_idle+0xec cv_timedwait_sig_hires+0x39d cv_reltimedwait_sig+0x4f svc_poll+0x1f2 svc_run+0x16e svc_do_run+0x8e nfssys+0xf1 dtrace_systrace_syscall32+0xe4 _sys_sysenter_post_swapgs+0x149 >
We were also able to reproduce the issue in our lab (see attached lab.svg flame graph).
Updated by Marcel Telka over 5 years ago
There are basically two ways how to fix this. The first way is to wake up in svc_queueclose() only the threads that are connected to the transport we are just closing. The problem with this solution is that we do not have simple way how to wake up threads selectively (remember: all of the threads are waiting on the same condition variable). This solution would need non-trivial changes in the threads synchronization in rpcmod with too high risks and questionable gain. So this is no go.
The second way is to not wake up any threads in svc_queueclose(). The current implementation of svc_queueclose() just wokes up all threads in a hope that one (or few) of them will disconnect from the transport in the near future and destroy the transport themselves. The svc_queueclose() returns immediately without waiting for the actual transport destroy. If we won't wake up any thread in svc_queueclose(), we will just postpone the actual transport destroy for a bit longer time than previously.
All threads have timeout (5 seconds by default) waiting on the condition variable, so the transport should be destroyed no later than 5 seconds (in worst case scenario) after the svc_queueclose() is called. Usually, it will happen sooner for two reasons: a) the thread is very likely already waiting on the condition variable for some time, so we do not need to wait the full timeout; b) if there will arrive some new work for the pool (new NFS request; for example) the thread might be woken up sooner than after the timeout expires.
Updated by Electric Monk over 5 years ago
- Status changed from Pending RTI to Closed
- % Done changed from 0 to 100
commit eaf32bf722e82dc74e90279e12e48f66891ef46a Author: Marcel Telka <firstname.lastname@example.org> Date: 2014-08-08T02:06:12.000Z 5070 svc_queueclose() could wake up too much threads Reviewed by: Rick Mesta <email@example.com> Reviewed by: Gordon Ross <firstname.lastname@example.org> Reviewed by: Dan Fields <email@example.com> Reviewed by: Josef Sipek <firstname.lastname@example.org> Reviewed by: Tony Nguyen <email@example.com> Reviewed by: Garrett D'Amore <firstname.lastname@example.org> Approved by: Dan McDonald <email@example.com>