Bug #7721


Poor nfs4_frlock() performance for blocking locks

Added by Marcel Telka over 4 years ago. Updated over 4 years ago.

In Progress
nfs - NFS server and client
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


The nfs4_frlock() implementation of the blocking lock requests does not scale in a case there are many concurrent processes trying to lock one single file.


test-7721.svg (278 KB) test-7721.svg Marcel Telka, 2017-01-03 07:10 PM

Related issues

Related to illumos gate - Bug #7601: nfs_rwlock_t does not scale with cv_broadcast()ClosedMarcel Telka2016-11-21

Related to illumos gate - Bug #3629: NFSv4 client: assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2429ClosedMarcel Telka2013-03-15

Actions #1

Updated by Marcel Telka over 4 years ago

  • Related to Bug #7601: nfs_rwlock_t does not scale with cv_broadcast() added
Actions #2

Updated by Marcel Telka over 4 years ago

The nfs4_frlock() implementation for blocking lock requests is described in #7601-3. For many concurrent processes the NFSv4 LOCK operation is usually denied (because the lock is already held by some other process). In addition, the DENY reply is sometimes delayed, depending on the NFSv4 server implementation; for illumos NFSv4 server the delay is about 70 milliseconds - see setlock() implementation. This means that in one second for an already held lock only about 14 processes (1000 / 70 = 14.2) is able to try to lock the file and realize the lock is already held.

The problem is that unlock requests are serialized using the same nfs_rwlock_t (r_lkserlock held as WRITER) as the lock requests. Imagine a case when there is one process holding a lock on a file and there are many (let say 2000) other processes trying to lock the same file. While one process is actually talking to the NFSv4 server (to realize its lock request is denied) all other processes are waiting to grab the r_lkserlock. Now assume the process that holds the lock on the file wants to unlock it (so one of the 2000 waiters can lock the file). The unlock request will need to grab the r_lkserlock as WRITER first, so it will need to wait until all of the 2000 processes that wants the same r_lkserlock will grab it before the unlocker. This will take about 2000 * 70 = 140000 milliseconds (140 seconds).

Imagine a test as described in #7601-3: You have 2000 processes and one file. All processes are trying to lock the file (using the blocking write lock), then wait for one second, unlock the file and wait again for one second. And do this in a loop. If you will leave this test to run for one hour (3600 seconds) you would expect to get about 3600 file lock/unlock cycles (in ideal world). Once the one hour period ends, one random process should hold the file lock, while all other processes should wait for the lock. If you'll leave the processes to finish their lock/1s-sleep/unlock cycle and then finish, the last process should be done in about 2000 seconds. So in the ideal world the test run time for the above scenario should be about 3600+2000=5600 seconds, and during this cycle the file lock should be grabbed for about 5600 times.

If you'll run this test over NFSv4 with the current NFSv4 client implementation and the current NFSv4 server implementation you'll find that the test run time is about 24000 seconds (instead of expected 5600 seconds) and the number of successful file locks is less than 2100 (instead of expected 5600).

Actions #3

Updated by Marcel Telka over 4 years ago

To address the flaw described above the nfs4_frlock() implementation needs to be changed. The current implementation of the blocking lock request first tries to consult the NFSv4 server to grab the lock. Once the lock is granted by the NFSv4 server, the lock is registered locally using the reclock() call (called from nfs4_register_lock_locally()). This is inefficient: why should I ask the NFSv4 server when there is some other local process already holding the lock? It is possible to find that information far easier and faster by just consulting the local locking code using reclock().

The basic idea of the fix is to use the local locking code - the reclock() - as the primary source of the file locking data. In a case a lock is granted by the local locking code then the NFSv4 server is consulted to get the global NFS-wide lock.

With the many processes trying to lock a single file we will switch from "all processes are waiting for r_lkserlock" to "all processes are waiting in reclock()". This will allow to handle all file unlocks immediately without the r_lkserlock bottleneck. This will also avoid the fruitless attempts from the NFSv4 client to get the lock from the NFSv4 server when there is no chance to get it (the file is locked by some other local process).

Actions #4

Updated by Marcel Telka over 4 years ago

I tested the current (old) nfs4_frlock() implementation and the new one. Please see the results in the attached test-7721.svg file. For comparison I also ran the test on a local file system (ZFS). We can see that results for both local file system and the new nfs4_frlock() implementation are same.

Here are some numbers not shown in the picture:

Phase 3 run time Lock grabbed
Old nfs4_frlock() implementation 20580 seconds 2057 times
New nfs4_frlock() implementation 2023 seconds 5563 times
Testing on ZFS 2035 seconds 5566 times
Optimal value 2000 seconds (lesser is better) 5600 times (more is better)
Actions #6

Updated by Marcel Telka over 4 years ago

  • Related to Bug #3629: NFSv4 client: assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2429 added

Also available in: Atom PDF