Project

General

Profile

Bug #11749

nfs4_end_*_seqid_sync() should call cv_signal()

Added by Marcel Telka 6 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
nfs - NFS server and client
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Bite-size
Tags:

Description

This is the nfs4_end_open_seqid_sync() implementation:

926 void
927 nfs4_end_open_seqid_sync(nfs4_open_owner_t *oop)
928 {
929     mutex_enter(&oop->oo_lock);
930     ASSERT(oop->oo_seqid_inuse);
931     oop->oo_seqid_inuse = 0;
932     cv_broadcast(&oop->oo_cv_seqid_sync);
933     mutex_exit(&oop->oo_lock);
934 }

And here is the other thread that is waiting for oo_cv_seqid_sync:

945 int
946 nfs4_start_open_seqid_sync(nfs4_open_owner_t *oop, mntinfo4_t *mi)
947 {

...

967     mutex_enter(&oop->oo_lock);
968
969     while (oop->oo_seqid_inuse) {
970         NFS4_DEBUG(nfs4_seqid_sync, (CE_NOTE,
971             "nfs4_start_open_seqid_sync waiting on cv"));
972 
973         cv_wait(&oop->oo_cv_seqid_sync, &oop->oo_lock);
974     }
975
976     oop->oo_seqid_inuse = 1;
977
978     mutex_exit(&oop->oo_lock);

...

994 }

There is no other user of neither oo_seqid_inuse nor oo_cv_seqid_sync.

The usage of cv_broadcast() in nfs4_end_open_seqid_sync() is overkill and could cause (on busy NFSv4 clients) unpredictable latencies for some file open operations. The problem is that cv_broadcast() wakes up all waiting threads and the winner of this race could easily be the waiter who came latest, while the oldest waiter could wait (almost) forever.

Since it is perfectly okay (and desired) to wake up one thread only (the oldest one), we should replace cv_broadcast() by cv_signal() in nfs4_end_open_seqid_sync().

History

#1

Updated by Marcel Telka 6 months ago

The funny thing is that the comment above the nfs4_end_open_seqid_sync() function says this (please note cv_signal there):

920 /*
921  * This ends our use of the open owner's open seqid by setting
922  * the appropiate flags and issuing a cv_signal to wake up another
923  * thread waiting to use the open seqid.
924  */
#2

Updated by Marcel Telka 6 months ago

  • Subject changed from nfs4_end_open_seqid_sync() should use cv_signal() to nfs4_end_open_seqid_sync() should call cv_signal()
#3

Updated by Marcel Telka 6 months ago

To test the change from cv_broadcast() to cv_signal() in nfs4_end_open_seqid_sync() we ran a test that opened/created, wrote and closed thousands of files as fast as possible. During that we measured the time spent in nfs4_start_open_seqid_sync().

In both cases (either cv_broadcast() or cv_signal()) the test completed in 104 seconds, so we did not saw any overall performance (throughput) improvement. We captured about 22k samples of time spent in nfs4_start_open_seqid_sync() for both test runs. In the table below there is a distribution of measured values with the worst case for cv_broadcast() at 2678ms and for cv_signal() at 1764ms.

We see that the distribution of time spent in nfs4_start_open_seqid_sync() is better for the cv_signal() case and also the worst case values are far better. This means that the time spent in nfs4_start_open_seqid_sync() is more deterministic with cv_signal() than with cv_broadcast().

In the table the column "#" contains the number of samples for the particular time span, while in column "sum" there is cumulative sum of samples for this particular time span + all worse time spans (above this line).

time [ms] cv_broadcast cv_signal
# sum # sum
2600 - 2699 2 2 - -
2500 - 2599 1 3 - -
2400 - 2499 0 3 - -
2300 - 2399 2 5 - -
2200 - 2299 2 7 - -
2100 - 2199 0 7 - -
2000 - 2099 1 8 - -
1900 - 1999 1 9 - -
1800 - 1899 2 11 - -
1700 - 1799 10 21 3 3
1600 - 1699 4 25 3 6
1500 - 1599 11 36 0 6
1400 - 1499 8 44 3 9
1300 - 1399 15 59 2 11
1200 - 1299 26 85 10 21
1100 - 1199 22 107 17 38
1000 - 1100 43 150 18 56
900 - 999 40 190 23 79
800 - 899 66 256 35 114
700 - 799 103 359 39 153
600 - 699 141 500 103 256
500 - 599 200 700 181 437
400 - 499 336 1036 304 741
300 - 399 576 1612 585 1326
200 - 299 954 2566 1027 2353
100 - 199 2162 4728 1784 4137
#5

Updated by Marcel Telka 6 months ago

  • Subject changed from nfs4_end_open_seqid_sync() should call cv_signal() to nfs4_end_*_seqid_sync() should call cv_signal()

During the code review it was found that the nfs4_start_lock_seqid_sync()/nfs4_end_lock_seqid_sync() pair uses the exactly same code pattern as nfs4_start_open_seqid_sync()/nfs4_end_open_seqid_sync() pair, so we will change nfs4_end_lock_seqid_sync() here too to use cv_signal() instead of cv_broadcast().

#6

Updated by Marcel Telka 6 months ago

  • Status changed from In Progress to Pending RTI
#7

Updated by Electric Monk 6 months ago

  • Status changed from Pending RTI to Closed
  • % Done changed from 0 to 100

git commit f2211ffec9a7ac3c1efc6de9347072f816f10a60

commit  f2211ffec9a7ac3c1efc6de9347072f816f10a60
Author: Marcel Telka <marcel@telka.sk>
Date:   2019-09-27T16:40:05.000Z

    11749 nfs4_end_*_seqid_sync() should call cv_signal()
    Reviewed by: Jan Schlien <illumos-bugs@jan-o-sch.net>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Arne Jansen <arne@die-jansens.de>
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF