Project

General

Profile

Bug #7601

nfs_rwlock_t does not scale with cv_broadcast()

Added by Marcel Telka almost 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
nfs - NFS server and client
Start date:
2016-11-21
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

The nfs_rwlock_t implementation uses cv_broadcast(9f) to wake up the waiters. This is clearly visible in nfs_rw_exit().

In a case there are several WRITERs waiting all of them will wake up, then one thread will grab the lock, but all others will just go back and wait again in cv_wait()/cv_wait_sig() - see nfs_rw_enter_sig(). This might cause significant performance degradation in some scenarios. To fix this the nfs_rwlock_t implementation should be changed to utilize cv_signal(9f) instead of cv_broadcast(9f).


Files

test.svg (291 KB) test.svg Marcel Telka, 2016-11-21 11:00 AM

Related issues

Related to illumos gate - Bug #7721: Poor nfs4_frlock() performance for blocking locksIn Progress2017-01-03

Actions
Related to illumos gate - Bug #7912: nfs_rwlock readers are running wild waiting for a writer that cannot comeClosed2017-02-27

Actions

History

#1

Updated by Marcel Telka almost 3 years ago

  • Description updated (diff)
#2

Updated by Marcel Telka almost 3 years ago

  • Description updated (diff)
#3

Updated by Marcel Telka almost 3 years ago

To easily and clearly demonstrate this issue we should find a case when there might be many threads trying to acquire one nfs_rwlock_t as a WRITER. Such a case is the nfs4_frlock() implementation doing the following (simplified):

  1. call nfs_rw_enter_sig(RW_WRITER)
  2. send the NFSv4 LOCK operation to the server
  3. in a case the NFSv4 LOCK operation is denied:

If we will fire many threads trying to lock the same file over NFSv4 we should see some problem. I implemented a test trying to lock one file in a cycle from 2000 processes for one hour.

The test is implemented to run in three phases. In phase 1 there are 2000 processes forked and a file is opened (over NFSv4). Once the phase 1 is completed phase 2 begins. In phase 2 all processes are doing the following:

  1. if the test time passed (one hour) then return
  2. fcntl(file, F_SETLKW, F_WRLCK)
  3. sleep(1 second)
  4. fcntl(file, F_SETLKW, F_UNLCK)
  5. sleep(1 second)
  6. goto 1

Once the test time passes (one hour) the phase 3 begins. Here all the processes completes their lock/unlock cycle (see above), close the file and exit(). The test counts the successful NFSv4 LOCK operations done by all processes.

I ran the test for the current nfs_rwlock_t implementation and for the new implementation done using cv_signal(9f). The test was run on a (virtual) 4 core machine with 6GB of RAM. The results are in the attached test.svg file. The timestamps in the picture are relative, with 0:00 set to the phase 2 begin.

Notes for the test results:
  • With the cv_broadcast() implementation the load in phase 2 peaks at about 200 (!). After ca 30 minutes the load went down and stabilized at about 35, in phase 3 the load goes slowly down.
  • With the cv_signal() implementation the load in phase 2 is almost zero (actually 0.02). In phase 3 the load goes up a bit to about 2.
  • Both implementations ended with the similar "load hill" peaking at about load 15-20. Interesting, but out of the scope of this work.
  • The overall test run is about 7 hours.
  • The overall test run time for the cv_signal() implementation is a bit shorter. The time difference is about 30 minutes. This is also good.
  • The number of successful NFSv4 lock/unlock cycles was similar for both implementations.
#4

Updated by Vitaliy Gusev almost 3 years ago

Marcel, is this only client-side issue, i.e. not related to nfs-server code ?

#5

Updated by Marcel Telka almost 3 years ago

Vitaliy Gusev wrote:

Marcel, is this only client-side issue, i.e. not related to nfs-server code ?

The nfs_rwlock_t is used by the NFS client only, so yes, this issue is related to NFS client only.

#6

Updated by Marcel Telka almost 3 years ago

  • Status changed from In Progress to Pending RTI
#7

Updated by Electric Monk almost 3 years ago

  • Status changed from Pending RTI to Closed
  • % Done changed from 0 to 100

git commit d1054fda727523967470132ee7266d3d934b5124

commit  d1054fda727523967470132ee7266d3d934b5124
Author: Marcel Telka <marcel@telka.sk>
Date:   2016-12-07T16:33:31.000Z

    7601 nfs_rwlock_t does not scale with cv_broadcast()
    Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Dan McDonald <danmcd@omniti.com>

#8

Updated by Marcel Telka almost 3 years ago

FAQ

Q: If we have only readers (comment starting at line 4820), wouldn't it make sense to use cv_broadcast() there, so we can get readers to finish what they are doing and we are not serializing the reads as cv_signal() would cause it to happen?

A: Good question. The similar "problem" is at line 4790.

Actually, even with the cv_broadcast() call the waiting (and now woken up) readers are serialized, because they need to grab the mutex before they return from the cv_wait()/cv_wait_sig() call.

The problem with the cv_broadcast() call is that when we wake up a lot of readers it is possible that one (new) writer might grab the mutex before some of the readers actually returns from the cv_wait()/cv_wait_sig(). If that's the case, then the writer will take the precedence and cause all of the following readers to spin and call cv_wait()/cv_wait_sig() again. I just tried to avoid that.

One might argue that the several cv_signal() calls in a row are slower than one cv_broadcast() call, even with a risk that sometimes the woken up threads (by cv_broadcast()) might just return back to cv_wait()/cv_wait_sig() again. Yes, that's easily possible, but I do not know for sure. I didn't do any measurement for that.

But ...

... the line 4826 should be executed in rare cases only, so it doesn't matter much if there is either cv_broadcase() or cv_signal() call. The performance difference would be almost zero. Also, if we wake up more than one reader here (at line 4826), all of them will call the cv_signal() at line 4790. So the situation would be very likely worse with the cv_broadcast() at line 4826 than it will be with the cv_signal() call.

The 4790 line is a bit different story. If there is a cv_broadcast() call there, all of the woken up readers will call the cv_broadcast() again (at line 4790). And this is definitely something we do not want to see.

#9

Updated by Marcel Telka almost 3 years ago

  • Related to Bug #7721: Poor nfs4_frlock() performance for blocking locks added
#10

Updated by Marcel Telka over 2 years ago

  • Related to Bug #7912: nfs_rwlock readers are running wild waiting for a writer that cannot come added

Also available in: Atom PDF