Project

General

Profile

Bug #5070

svc_queueclose() could wake up too much threads

Added by Marcel Telka over 5 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
nfs - NFS server and client
Start date:
2014-08-06
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

In some cases we could have a lot of threads (hundreds) sleeping in svc_poll() and waiting for new work at the pool->p_req_cv condition variable. Here:

2033        /*
2034         * Count us as asleep, mark this thread as safe
2035         * for suspend and wait for a request.
2036         */
2037        pool->p_asleep++;
2038        timeleft = cv_reltimedwait_sig(&pool->p_req_cv,
2039            &pool->p_req_lock, pool->p_timeout, TR_CLOCK_TICK);

This could easily happen, for example when you have hundreds clients calling NFS operations serialized, with short delays between the NFS calls. This is even easier to observe with very fast NFS operations, like NFS NULL.

In this scenario, when one client disconnects and svc_queueclose() is called, we might end up waking up all those sleeping threads:

2339        /*
2340         * Wakeup threads sleeping in svc_poll() so that they
2341         * unlink from the transport
2342         */
2343        mutex_enter(&xprt->xp_pool->p_req_lock);
2344        cv_broadcast(&xprt->xp_pool->p_req_cv);
2345        mutex_exit(&xprt->xp_pool->p_req_lock);

We are waking up all sleeping threads, because we have no other way how to wake the only thread (or few threads) attached to the transport we are just closing.

We are waking up threads for one reason only: so they can disconnect from the transport and the transport could be properly closed and destroyed.

The problem with such waking up is that several hundreds of threads are woken up just so they find they have nothing to do (because they are not attached to the closing transport). After that they immediately go back to sleep.

It is well known that waking up many threads for no reason using cv_broadcast() is causing huge performance penalty and it should be avoided when possible.

We saw this issue at a customer (please see attached customer.svg file with the flame graph) where we see that more than 22% of the time is spent spinning in cv_reltimedwait_sig().

The problem is also visible in the crash dump file we have from the customer. There is a lot of runnable threads (waiting for CPU) in cv_reltimedwait_sig() and few others just running on CPU (plus 50 threads waiting in cv_reltimedwait_sig(); these are the threads just recently returned back to sleep):

> ::stacks -m rpcmod -c cv_reltimedwait_sig
THREAD           STATE    SOBJ                COUNT
ffffff439b1c3080 RUN      <NONE>                283
                 swtch+0x141
                 cv_timedwait_sig_hires+0x39d
                 cv_reltimedwait_sig+0x4f
                 svc_poll+0x1f2
                 svc_run+0x16e
                 svc_do_run+0x8e
                 nfssys+0xf1
                 dtrace_systrace_syscall32+0xe4
                 _sys_sysenter_post_swapgs+0x149

ffffff439dc680c0 SLEEP    CV                     50
                 swtch+0x141
                 cv_timedwait_sig_hires+0x39d
                 cv_reltimedwait_sig+0x4f
                 svc_poll+0x1f2
                 svc_run+0x16e
                 svc_do_run+0x8e
                 nfssys+0xf1
                 dtrace_systrace_syscall32+0xe4
                 _sys_sysenter_post_swapgs+0x149

ffffff439f9c5060 ONPROC   <NONE>                  1
                 0
                 default_lock_delay+0x3f
                 mutex_vector_enter+0xcc
                 cv_timedwait_sig_hires+0x3c9
                 cv_reltimedwait_sig+0x4f
                 svc_poll+0x1f2
                 svc_run+0x16e
                 svc_do_run+0x8e
                 nfssys+0xf1
                 dtrace_systrace_syscall32+0xe4
                 _sys_sysenter_post_swapgs+0x149

ffffff439e0fd500 ONPROC   <NONE>                  1
                 0xb
                 _resume_from_idle+0xec
                 cv_timedwait_sig_hires+0x39d
                 cv_reltimedwait_sig+0x4f
                 svc_poll+0x1f2
                 svc_run+0x16e
                 svc_do_run+0x8e
                 nfssys+0xf1
                 dtrace_systrace_syscall32+0xe4
                 _sys_sysenter_post_swapgs+0x149

>

We were also able to reproduce the issue in our lab (see attached lab.svg flame graph).


Files

customer.svg (177 KB) customer.svg Marcel Telka, 2014-08-06 01:20 PM
lab.svg (466 KB) lab.svg Marcel Telka, 2014-08-06 01:20 PM

History

#1

Updated by Marcel Telka over 5 years ago

There are basically two ways how to fix this. The first way is to wake up in svc_queueclose() only the threads that are connected to the transport we are just closing. The problem with this solution is that we do not have simple way how to wake up threads selectively (remember: all of the threads are waiting on the same condition variable). This solution would need non-trivial changes in the threads synchronization in rpcmod with too high risks and questionable gain. So this is no go.

The second way is to not wake up any threads in svc_queueclose(). The current implementation of svc_queueclose() just wokes up all threads in a hope that one (or few) of them will disconnect from the transport in the near future and destroy the transport themselves. The svc_queueclose() returns immediately without waiting for the actual transport destroy. If we won't wake up any thread in svc_queueclose(), we will just postpone the actual transport destroy for a bit longer time than previously.

All threads have timeout (5 seconds by default) waiting on the condition variable, so the transport should be destroyed no later than 5 seconds (in worst case scenario) after the svc_queueclose() is called. Usually, it will happen sooner for two reasons: a) the thread is very likely already waiting on the condition variable for some time, so we do not need to wait the full timeout; b) if there will arrive some new work for the pool (new NFS request; for example) the thread might be woken up sooner than after the timeout expires.

#2

Updated by Marcel Telka over 5 years ago

  • Status changed from New to In Progress
#3

Updated by Marcel Telka over 5 years ago

  • Status changed from In Progress to Pending RTI
#4

Updated by Electric Monk over 5 years ago

  • Status changed from Pending RTI to Closed
  • % Done changed from 0 to 100

git commit eaf32bf722e82dc74e90279e12e48f66891ef46a

commit  eaf32bf722e82dc74e90279e12e48f66891ef46a
Author: Marcel Telka <marcel.telka@nexenta.com>
Date:   2014-08-08T02:06:12.000Z

    5070 svc_queueclose() could wake up too much threads
    Reviewed by: Rick Mesta <rick.mesta@nexenta.com>
    Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
    Reviewed by: Dan Fields <dan.fields@nexenta.com>
    Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
    Reviewed by: Tony Nguyen <tony.nguyen@nexenta.com>
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Approved by: Dan McDonald <danmcd@omniti.com>

Also available in: Atom PDF