Project

General

Profile

Bug #12681

Excess service times for NFS(v3) locking calls

Added by Gordon Ross 7 months ago. Updated 7 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
nfs - NFS server and client
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Customer reports slow access from Linux clients running apache services.
Examination of network captures on both sides shows that "call back" attempts all fail
(where the server tries to do RPC bind and to open a connection back to the client).
It turns out there's a firewall preventing calls back to the client. However:
The customer insists this all works with (other vendor's) server.

#1

Updated by Gordon Ross 7 months ago

Additional analysis of this problem (where we have a firewall blocking call-backs)
The NLM server gets into an unfortunate state with this configuration, apparently hung.

We forced a crash dump and found that we have 511 threads all blocked like this:

> ::stacks -m rpcmod                  
THREAD           STATE    SOBJ                COUNT
ffffd09c24848820 SLEEP    CV                    511
                 swtch+0x141
                 cv_timedwait_sig_hires+0x39d
                 cv_timedwait_sig+0x49
                 wait_for_lock+0x10c
                 flk_wait_execute_request+0x230
                 flk_process_request+0x4ed
                 reclock+0x5dd
                 fs_frlock+0x187
                 zfs_frlock+0xf8
                 fop_frlock+0x65
                 nlm_vop_frlock+0x51
                 nlm_block+0xeb
                 nlm_do_lock+0x2f5
                 nlm4_lock_4_svc+0x2a
                 nlm_dispatch+0xfe
                 nlm_prog_4+0x34
                 svc_getreq+0x1c1
                 svc_run+0x198
                 svc_do_run+0x8e
                 nfssys+0x111
                 _sys_sysenter_post_swapgs+0x149

Let's have a look at maxthreads...
> ffffd09c24848820 ::findstack -v
stack pointer for thread ffffd09c24848820: ffffd002e3781260
[ ffffd002e3781260 _resume_from_idle+0x112() ]
  ffffd002e3781290 swtch+0x141()
  ffffd002e3781330 cv_timedwait_sig_hires+0x39d(ffffd0c732c343d4, ffffd0940feccc80, 34630b8a000, 989680, 0)
  ffffd002e3781380 cv_timedwait_sig+0x49(ffffd0c732c343d4, ffffd0940feccc80, 11e88ee)
  ffffd002e37813c0 wait_for_lock+0x10c(ffffd0c732c34300)
  ffffd002e3781420 flk_wait_execute_request+0x230(ffffd0c732c34300)
  ffffd002e37814a0 flk_process_request+0x4ed(ffffd0c732c34300)
  ffffd002e3781630 reclock+0x5dd(ffffd09de5360580, ffffd002e3781a00, e, 10003, 0, ffffd002e3781650)
  ffffd002e37816d0 fs_frlock+0x187(ffffd09de5360580, 7, ffffd002e3781a00, 10003, 0, ffffd002e37818f0, ffffd09cfa33c760, 0)
  ffffd002e3781770 zfs_frlock+0xf8(ffffd09de5360580, 7, ffffd002e3781a00, 10003, 0, ffffd002e37818f0, ffffd09cfa33c760, 0)
  ffffd002e3781810 fop_frlock+0x65(ffffd09de5360580, 7, ffffd002e3781a00, 10003, 0, ffffd002e37818f0, ffffd09cfa33c760, 0)
  ffffd002e3781860 nlm_vop_frlock+0x51(ffffd09de5360580, 7, ffffd002e3781a00, 10003, 0, ffffd002e37818f0, ffffd09cfa33c760, 0)
  ffffd002e37819a0 nlm_block+0xeb(ffffd002e3781b60, ffffd09cb1d006b8, ffffd0a192ea4b10, ffffd09c39f5ddc8, ffffd002e3781a00, fffffffff868c330)
  ffffd002e3781ac0 nlm_do_lock+0x2f5(ffffd002e3781b60, ffffd002e3781b10, ffffd002e3781ca0, fffffffff868c300, 0, fffffffff868c330)
  ffffd002e3781af0 nlm4_lock_4_svc+0x2a(ffffd002e3781b60, ffffd002e3781b10, ffffd002e3781ca0)
  ffffd002e3781c20 nlm_dispatch+0xfe(ffffd002e3781ca0, ffffd0a146edea00, fffffffff8693610)
  ffffd002e3781c40 nlm_prog_4+0x34(ffffd002e3781ca0, ffffd0a146edea00)
  ffffd002e3781d20 svc_getreq+0x1c1(ffffd0a146edea00, ffffd0aaa5f29f00)
  ffffd002e3781d90 svc_run+0x198(ffffd0a02e2b0750)
  ffffd002e3781dd0 svc_do_run+0x8e(2)
  ffffd002e3781ec0 nfssys+0x111(e, febf0fbc)
  ffffd002e3781f10 _sys_sysenter_post_swapgs+0x149()
> ffffd0a02e2b0750 ::print struct __svcpool
{
    p_next = 0
    p_prev = 0xffffd09c425573f8
    p_id = 0x2
    p_threads = 0x2
    p_detached_threads = 0x1ff
    p_maxthreads = 0x200
    p_redline = 0x1
    p_reserved_threads = 0
    p_thread_lock = {
        _opaque = [ 0 ]
    }
    p_asleep = 0
    p_drowsy = 0
    p_req_cv = {
        _opaque = 0
    }
    p_timeout = 0x1f4
    p_req_lock = {
        _opaque = [ 0 ]
    }
    p_reqs = 0x300e97
    p_walkers = 0
    p_max_same_xprt = 0x8
    p_stksize = 0
    p_closing = 0
    p_creator_signaled = 0
    p_creator_exit = 0
    p_creator_cv = {
        _opaque = 0x1
    }
    p_creator_lock = {
        _opaque = [ 0 ]
    }
    p_lcount = 0x1a6
    p_lhead = 0xffffd0a4d8f831f0
    p_lrwlock = {
> 

Yup. We're simply out of RPC threads in the lockd service!
We're also blocking against locks held by clients but where we've been unable to notify those clients (with a lock call-back) that they got the lock they asked for.

#2

Updated by Gordon Ross 7 months ago

It turns out the dump shown above was misleading, and shows the state of things after the call-back attempts failed. That activity eventually will all time out, but that takes a while

We had some other dumps (sorry, don't have the stacks handy now) where we can see that all the NLM RPC service threads are blocked in rpcbind_getaddr calls. Looking into that some more, I see that we're attempting the rpbbind_getaddr call earlier than we need to, and doing so even when we will not use the handle. This is part of what's causing us to run out of threads, because at the point where we block trying to setup this handle, the rpc service thread has not yet "detached" itself from the RPC service thread pool. RPC threads are not supposed to block (for a long time) until they have detached. That's why we run out of threads.

I'll move that RPC client handle setup (which does the rpcbind_getaddr) call later, doing it only when we know we need to call back the client (i.e. for a lock call that must block).
This should allow non-contested lock calls to proceed without attempting to setup the callback handle.
There were also problems with reporting errors when a call-back fails, and cleaning up state.

This also turns out to dramatically improve locking performance over NFS when the lock request is non-contested.

Out for review:
https://illumos.org/rb/r/2545/

#3

Updated by Gordon Ross 7 months ago

Testing: Run Linux NFS client from behind a firewall that prevents any call-back RPC.
Verified uncontested locks can be taken without call-back RPC (these used to fail).
Verified the call-back failures are reported, and lock state is cleaned up when that happens.
General regression tests via the Connectathon NFS lock tests.

#4

Updated by Dan McDonald 7 months ago

We had been seeing in Joyent unusual NFS problems with an LX NFS client (using a SmartOS native-assistant lx_lockd, but actual Linux rpcbind and/or rpc.statd), but those both seem to dissipate in the presence of a combination of a SmartOS lx-brand fix, along with this fix on the server end.

#5

Updated by Gordon Ross 7 months ago

  • Status changed from In Progress to Pending RTI
#6

Updated by Electric Monk 7 months ago

  • Status changed from Pending RTI to Closed
  • % Done changed from 0 to 100

git commit 54b3584e66419f0f60689044f13d702660b4233f

commit  54b3584e66419f0f60689044f13d702660b4233f
Author: Gordon Ross <gwr@nexenta.com>
Date:   2020-05-09T23:08:44.000Z

    12681 Excess service times for NFS(v3) locking calls
    Reviewed by: Evan Layton <evan.layton@nexenta.com>
    Reviewed by: Dan McDonald <danmcd@joyent.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Also available in: Atom PDF