Project

General

Profile

Bug #5070

svc_queueclose() could wake up too much threads

Added by Marcel Telka over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
nfs - NFS server and client
Start date:
2014-08-06
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

In some cases we could have a lot of threads (hundreds) sleeping in svc_poll() and waiting for new work at the pool->p_req_cv condition variable. Here:

2033        /*
2034         * Count us as asleep, mark this thread as safe
2035         * for suspend and wait for a request.
2036         */
2037        pool->p_asleep++;
2038        timeleft = cv_reltimedwait_sig(&pool->p_req_cv,
2039            &pool->p_req_lock, pool->p_timeout, TR_CLOCK_TICK);

This could easily happen, for example when you have hundreds clients calling NFS operations serialized, with short delays between the NFS calls. This is even easier to observe with very fast NFS operations, like NFS NULL.

In this scenario, when one client disconnects and svc_queueclose() is called, we might end up waking up all those sleeping threads:

2339        /*
2340         * Wakeup threads sleeping in svc_poll() so that they
2341         * unlink from the transport
2342         */
2343        mutex_enter(&xprt->xp_pool->p_req_lock);
2344        cv_broadcast(&xprt->xp_pool->p_req_cv);
2345        mutex_exit(&xprt->xp_pool->p_req_lock);

We are waking up all sleeping threads, because we have no other way how to wake the only thread (or few threads) attached to the transport we are just closing.

We are waking up threads for one reason only: so they can disconnect from the transport and the transport could be properly closed and destroyed.

The problem with such waking up is that several hundreds of threads are woken up just so they find they have nothing to do (because they are not attached to the closing transport). After that they immediately go back to sleep.

It is well known that waking up many threads for no reason using cv_broadcast() is causing huge performance penalty and it should be avoided when possible.

We saw this issue at a customer (please see attached customer.svg file with the flame graph) where we see that more than 22% of the time is spent spinning in cv_reltimedwait_sig().

The problem is also visible in the crash dump file we have from the customer. There is a lot of runnable threads (waiting for CPU) in cv_reltimedwait_sig() and few others just running on CPU (plus 50 threads waiting in cv_reltimedwait_sig(); these are the threads just recently returned back to sleep):

> ::stacks -m rpcmod -c cv_reltimedwait_sig
THREAD           STATE    SOBJ                COUNT
ffffff439b1c3080 RUN      <NONE>                283
                 swtch+0x141
                 cv_timedwait_sig_hires+0x39d
                 cv_reltimedwait_sig+0x4f
                 svc_poll+0x1f2
                 svc_run+0x16e
                 svc_do_run+0x8e
                 nfssys+0xf1
                 dtrace_systrace_syscall32+0xe4
                 _sys_sysenter_post_swapgs+0x149

ffffff439dc680c0 SLEEP    CV                     50
                 swtch+0x141
                 cv_timedwait_sig_hires+0x39d
                 cv_reltimedwait_sig+0x4f
                 svc_poll+0x1f2
                 svc_run+0x16e
                 svc_do_run+0x8e
                 nfssys+0xf1
                 dtrace_systrace_syscall32+0xe4
                 _sys_sysenter_post_swapgs+0x149

ffffff439f9c5060 ONPROC   <NONE>                  1
                 0
                 default_lock_delay+0x3f
                 mutex_vector_enter+0xcc
                 cv_timedwait_sig_hires+0x3c9
                 cv_reltimedwait_sig+0x4f
                 svc_poll+0x1f2
                 svc_run+0x16e
                 svc_do_run+0x8e
                 nfssys+0xf1
                 dtrace_systrace_syscall32+0xe4
                 _sys_sysenter_post_swapgs+0x149

ffffff439e0fd500 ONPROC   <NONE>                  1
                 0xb
                 _resume_from_idle+0xec
                 cv_timedwait_sig_hires+0x39d
                 cv_reltimedwait_sig+0x4f
                 svc_poll+0x1f2
                 svc_run+0x16e
                 svc_do_run+0x8e
                 nfssys+0xf1
                 dtrace_systrace_syscall32+0xe4
                 _sys_sysenter_post_swapgs+0x149

>

We were also able to reproduce the issue in our lab (see attached lab.svg flame graph).


Files

customer.svg (177 KB) customer.svg Marcel Telka, 2014-08-06 01:20 PM
lab.svg (466 KB) lab.svg Marcel Telka, 2014-08-06 01:20 PM

Also available in: Atom PDF