Project

General

Profile

Bug #6225

NFSv4: setlock() can spin forever

Added by Marcel Telka over 4 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
nfs - NFS server and client
Start date:
2015-09-12
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Here is the simplified setlock() implementation:

int
setlock(...)
{
        int error;

retry:
        for (...) {
                error = VOP_FRLOCK(F_SETLK, ...);
        }

        if (error) {
                VOP_FRLOCK(F_GETLK, ...);
                if (F_UNLCK)
                        goto retry;
        }

        return (error);
}

The setlock() just tries to get the lock in the for loop. If it is not successful it tries to find who is preventing us to get the lock. If it finds that there is no pending lock, it retries (goto retry). If we are extremely unlucky, some other process might always hold a lock when we do VOP_FRLOCK(F_SETLK), and it might always release such a lock just before we do VOP_FRLOCK(F_GETLK). This would cause the setlock() to spin forever. We should make sure this extremely unlucky scenario cannot happen.

In addition to the extremely unlucky (but possible) scenario described above the setlock() spin could be reproduced far easier:

The current core file locking implementation is write-preferring. This means that if there is a read lock active (process 1) and one other write lock is waiting (process 2) for the read lock, the additional read lock request (process 3) will fail (or block). In a case the process 3 will try to get the information about the lock preventing us from getting the read lock (using F_GETLK), the F_UNLCK is returned. This will cause that setlock() to spin forever too. Please see Bug #5957 for discussion about write-preferring vs. read-preferring locking and for an example about the described behavior.

There are several problems caused by the spinning setlock():

  1. The setlock() thread holds the exported_lock as reader - see rfs4_compound(). This means that during the setlock() spin all attempts to acquire the exported_lock as writer will block and wait until the setlock() thread stops to spin. For example every attempt to share or unshare a filesystem will hang.
  2. The NFSv4 client won't see the reply to its NFSv4 lock call (since the particular thread responsible for sending the reply is spinning in setlock()).

Steps to reproduce

You need one NFS server with a shared filesystem (let say /export) and one NFS client with the mounted share from the NFS server using the NFSv4 protocol (mount -o vers=4 SERVER:/export /mnt).

You also need to complie the attached lock.c source file in the /export directory:

# gcc -Wall -o lock lock.c

Do this locally on the NFS server in the /export directory:

# ./lock r file &
F_SETLK: F_RDLCK
sleeping
[1] 718
# sleep 2
# ./lock W file
F_SETLKW: F_WRLCK

Do this on the NFS client in the /mnt directory:

# ./lock R file 
F_SETLKW: F_RDLCK

Now we can find a thread on the NFS server cycling in setlock():

> ::stacks -c setlock
THREAD           STATE    SOBJ                COUNT
ffffff00d0fb38a0 SLEEP    CV                      1
                 swtch+0x141
                 cv_wait+0x70
                 delay_common+0xb8
                 delay+0x30
                 setlock+0x164
                 rfs4_do_lock+0x1b8
                 rfs4_op_lock+0x619
                 rfs4_compound+0x1e9
                 rfs4_dispatch+0x21d
                 common_dispatch+0x765
                 rfs_dispatch+0x2d
                 svc_getreq+0x1c1
                 svc_run+0x146
                 svc_do_run+0x8e
                 nfssys+0xf1
                 _sys_sysenter_post_swapgs+0x149

>

The thread calls in a cycle VOP_FRLOCK() with F_SETLK (or F_SETLK_NBMAND) and F_GETLK. The VOP_FRLOCK(F_SETLK) will always fail, but the VOP_FRLOCK(F_GETLK) will always succeed and return F_UNLCK. This will cause the infinite spin in setlock().

Now, when you'll try to unshare any exported filesystem, the unshare command will hang with this stack:

> ffffff00c8dd97a0::findstack -v
stack pointer for thread ffffff00c8dd97a0: ffffff0002ebcc80
[ ffffff0002ebcc80 _resume_from_idle+0xf4() ]
  ffffff0002ebccb0 swtch+0x141()
  ffffff0002ebcd50 turnstile_block+0x21a(0, 0, ffffffffc01afc88, fffffffffbc08bc0, 0, 0)
  ffffff0002ebcdc0 rw_enter_sleep+0x19b(ffffffffc01afc88, 0)
  ffffff0002ebd270 unexport+0x28(ffffff00d2943340)
  ffffff0002ebdd40 exportfs+0x25f(ffffff0002ebdda0, 100000, ffffff00d0d72470)
  ffffff0002ebdd80 stubs_common_code+0x51()
  ffffff0002ebddd0 nfs_export+0x9e(8046048)
  ffffff0002ebdec0 nfssys+0x388(2, 8046048)
  ffffff0002ebdf10 _sys_sysenter_post_swapgs+0x149()
>

Files

lock.c (1.41 KB) lock.c Marcel Telka, 2015-09-12 08:48 AM

Related issues

Related to illumos gate - Bug #5957: File locking should be read-preferringNew2015-05-26

Actions
Related to illumos gate - Bug #6253: F_GETLK doesn't always return lock ownerClosed2015-09-21

Actions

History

#1

Updated by Marcel Telka over 4 years ago

  • Related to Bug #5957: File locking should be read-preferring added
#3

Updated by Marcel Telka over 4 years ago

The solution

The NFSv4.0 LOCK operation does not support the blocking lock (at the NFSv4.0
protocol level) so the client needs to resend the LOCK request in a case the
lock is denied by the NFSv4 server. NFSv4.0 clients are prepared for that
(obviously); they are sending the LOCK requests with some delays between the
attempts. See nfs4frlock() and nfs4_block_and_wait() for the locking and delay
implementation at the client side.

To make the life of the clients easier, the illumos NFSv4 server tries to do
some fast retries on its own (the for loop in setlock()) in a hope the lock will be
available soon. And if not, the client won't need to resend the LOCK requests
so fast to check the lock availability. This basically saves some network
traffic and tries to make sure the client gets the lock ASAP.

So the for loop in setlock() makes perfect sense and it is good to have, but it
is completely wrong to allow the setlock() spin forever (goto retry).

The solution is easy: Having the above in mind, make sure the setlock() won't
spin forever, so the "goto retry" is going to be removed.

#4

Updated by Marcel Telka about 4 years ago

  • Status changed from In Progress to Pending RTI
#5

Updated by Arne Jansen about 4 years ago

  • Related to Bug #6253: F_GETLK doesn't always return lock owner added
#6

Updated by Marcel Telka about 4 years ago

Arne Jansen and Gordon Ross requested to preserve the retry loop and limit it to 10 instead. This is the explanation (written by Gordon):

There's a race inherent in the current VOP_FRLOCK design where:
a: "other guy" takes a lock that conflicts with a lock we want
b: we attempt to take our lock (non-blocking) and the attempt fails.
c: "other guy" releases the conflicting lock
d: we ask what lock conflicts with the lock we want, getting F_UNLCK
(no lock blocks us)

If we retry the non-blocking lock attempt in this case (restart at
step 'b') there's some possibility that many such attempts might fail.
However a test designed to actually provoke this race shows that the
vast majority of cases require no retry, and only a few took as many
as three retries.  Here's the test outcome:

        number of retries    how many times we needed that many retries
        0                    79461
        1                      862
        2                       49
        3                        5

Given those empirical results, we arbitrarily limit the retry count to ten.

If we actually make to ten retries and give up, nothing catastrophic
happens, but we're unable to return the information about the
conflicting lock to the NFS client.  That's an acceptable trade off
vs. letting this retry loop run forever.
#7

Updated by Electric Monk about 4 years ago

  • Status changed from Pending RTI to Closed
  • % Done changed from 0 to 100

git commit 55a4551ddde8feed8f8e011458f992ea74257bd9

commit  55a4551ddde8feed8f8e011458f992ea74257bd9
Author: Marcel Telka <marcel.telka@nexenta.com>
Date:   2015-10-07T18:30:42.000Z

    6225 NFSv4: setlock() can spin forever
    Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
    Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Arne Jansen <arne@die-jansens.de>
    Approved by: Dan McDonald <danmcd@omniti.com>

Also available in: Atom PDF