Project

General

Profile

Actions

Bug #3585

open

NFSv4 client violates "Client Retransmission Behavior"

Added by Marcel Telka over 8 years ago. Updated about 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
nfs - NFS server and client
Start date:
2013-02-20
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

The NFSv4 client violates this part of the RFC 7530:

3.1.1.  Client Retransmission Behavior

   When processing an NFSv4 request received over a reliable transport
   such as TCP, the NFSv4 server MUST NOT silently drop the request,
   except if the established transport connection has been broken.
   Given such a contract between NFSv4 clients and servers, clients MUST
   NOT retry a request unless one or both of the following are true:

   o  The transport connection has been broken

   o  The procedure being retried is the NULL procedure

   Since reliable transports, such as TCP, do not always synchronously
   inform a peer when the other peer has broken the connection (for
   example, when an NFS server reboots), the NFSv4 client may want to
   actively "probe" the connection to see if has been broken.  Use of
   the NULL procedure is one recommended way to do so.  So, when a
   client experiences a remote procedure call timeout (of some arbitrary
   implementation-specific amount), rather than retrying the remote
   procedure call, it could instead issue a NULL procedure call to the
   server.  If the server has died, the transport connection break will
   eventually be indicated to the NFSv4 client.  The client can then
   reconnect, and then retry the original request.  If the NULL
   procedure call gets a response, the connection has not broken.  The
   client can decide to wait longer for the original request's response,
   or it can break the transport connection and reconnect before
   re-sending the original request.

   For callbacks from the server to the client, the same rules apply,
   but the server doing the callback becomes the client, and the client
   receiving the callback becomes the server.

I reproduced it by slowing down the WRITE operation at the NFSv4 server - I just added delay(SEC_TO_TICK(8)); into the rfs4_op_write() function. Then I mounted the filesystem using the vers=4,timeo=2,forcedirectio mount options, finally I ran this command:

# dd if=/dev/zero of=d bs=1 count=1

Here is the communication between the NFSv4 client and the NFSv4 server:

# snoop -ta -r -d e1000g1
Using device e1000g1 (promiscuous mode)
17:07:33.56060   10.0.100.2 -> 10.0.100.1   NFS C 4 (lookup valid) PUTFH FH=7E65 NVERIFY GETATTR 10011a b0a23a ACCESS rd,lk,mo,ext,dl LOOKUP d GETFH GETATTR 10011a ...
17:07:33.56076   10.0.100.1 -> 10.0.100.2   NFS R 4 (lookup valid) NFS4ERR_SAME PUTFH NFS4_OK NVERIFY NFS4ERR_SAME 
17:07:33.56108   10.0.100.2 -> 10.0.100.1   NFS C 4 (getattr     ) PUTFH FH=7E65 GETATTR 10111a b0a23a 
17:07:33.56119   10.0.100.1 -> 10.0.100.2   NFS R 4 (getattr     ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK 
17:07:33.56148   10.0.100.2 -> 10.0.100.1   NFS C 4 (lookup      ) PUTFH FH=7E65 SAVEFH LOOKUP d GETFH GETATTR 10011a b0a23a RESTOREFH NVERIFY GETATTR 10011a b0a23a...
17:07:33.56155   10.0.100.1 -> 10.0.100.2   NFS R 4 (lookup      ) NFS4ERR_SAME PUTFH NFS4_OK SAVEFH NFS4_OK LOOKUP NFS4_OK GETFH NFS4_OK FH=882D GETATTR NFS4_OK RESTOREFH NFS4_...
17:07:33.56185   10.0.100.2 -> 10.0.100.1   NFS C 4 (open        ) PUTFH FH=7E65 SAVEFH OPEN d OT=CR(U) SQ=10 CT=N AC=W DN=N OO=0048 GETFH GETATTR 10011a b0a23a RES...
17:07:33.56295   10.0.100.1 -> 10.0.100.2   NFS R 4 (open        ) NFS4_OK PUTFH NFS4_OK SAVEFH NFS4_OK OPEN NFS4_OK ST=141B:1 RF=CF,PL DT=N GETFH NFS4_OK FH=882D GETATTR N...
17:07:33.56329   10.0.100.2 -> 10.0.100.1   NFS C 4 (open_confirm) PUTFH FH=882D OPEN_CONFIRM SQ=11 OST=141B:1 
17:07:33.56335   10.0.100.1 -> 10.0.100.2   NFS R 4 (open_confirm) NFS4_OK PUTFH NFS4_OK OPEN_CONFIRM NFS4_OK OST=141B:2 
17:07:33.56366   10.0.100.2 -> 10.0.100.1   NFS C 4 (access      ) PUTFH FH=882D ACCESS rd,mo,ext,exc 
17:07:33.56373   10.0.100.1 -> 10.0.100.2   NFS R 4 (access      ) NFS4_OK PUTFH NFS4_OK ACCESS NFS4_OK Supp=rd,mo,ext,exc Allow=rd,mo,ext 
17:07:33.56402   10.0.100.2 -> 10.0.100.1   NFS C 4 (delegreturn ) PUTFH FH=882D GETATTR 10011a b0a23a DELEGRETURN DST=1A28:0 
17:07:33.56409   10.0.100.1 -> 10.0.100.2   NFS R 4 (delegreturn ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK DELEGRETURN NFS4_OK 
17:07:33.56444   10.0.100.2 -> 10.0.100.1   NFS C 4 (write       ) PUTFH FH=882D WRITE ST=141B:2 at 0 for 1 
17:07:33.63616   10.0.100.1 -> 10.0.100.2   TCP D=1014 S=2049 Ack=1915632030 Seq=3953912707 Len=0 Win=32806 Options=<nop,nop,tstamp 6752079 7258513>
17:07:34.96837   10.0.100.2 -> 10.0.100.1   NFS C 4 (write       ) PUTFH FH=882D WRITE ST=141B:2 at 0 for 1  (retransmit)
17:07:35.03675   10.0.100.1 -> 10.0.100.2   TCP D=1014 S=2049 Ack=1915632250 Seq=3953912707 Len=0 Win=32806 Options=<nop,nop,tstamp 6752219 7258653>
17:07:39.76242   10.0.100.2 -> 10.0.100.1   NFS C 4 (write       ) PUTFH FH=882D WRITE ST=141B:2 at 0 for 1  (retransmit)
17:07:39.83070   10.0.100.1 -> 10.0.100.2   TCP D=1014 S=2049 Ack=1915632470 Seq=3953912707 Len=0 Win=32806 Options=<nop,nop,tstamp 6752699 7259133>
17:07:41.56473   10.0.100.1 -> 10.0.100.2   NFS R 4 (write       ) NFS4_OK PUTFH NFS4_OK WRITE NFS4_OK 1 (FSYNC) 
17:07:41.56657   10.0.100.2 -> 10.0.100.1   NFS C 4 (close       ) PUTFH FH=882D GETATTR 10011a b0a23a CLOSE SQ=12 OST=141B:2 
17:07:41.56696   10.0.100.1 -> 10.0.100.2   NFS R 4 (close       ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK CLOSE OST=141B:3 
17:07:41.63347   10.0.100.2 -> 10.0.100.1   TCP D=2049 S=1014 Ack=3953913035 Seq=1915632690 Len=0 Win=32806 Options=<nop,nop,tstamp 7259320 6752872>

We can see that the client retransmits the WRITE request without breaking the transport connection and thus violates the RFC.

Actions #1

Updated by Marcel Telka over 8 years ago

I tested similar scenario with linux (CentOS 6.3) and linux works correctly (doesn't retransmit on the same connection). After the specified timeo linux closed the connection.

Actions #2

Updated by Marcel Telka about 5 years ago

  • Description updated (diff)
Actions

Also available in: Atom PDF