Bug #3585


NFSv4 client violates "Client Retransmission Behavior"

Added by Marcel Telka almost 9 years ago. Updated about 5 years ago.

nfs - NFS server and client
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


The NFSv4 client violates this part of the RFC 7530:

3.1.1.  Client Retransmission Behavior

   When processing an NFSv4 request received over a reliable transport
   such as TCP, the NFSv4 server MUST NOT silently drop the request,
   except if the established transport connection has been broken.
   Given such a contract between NFSv4 clients and servers, clients MUST
   NOT retry a request unless one or both of the following are true:

   o  The transport connection has been broken

   o  The procedure being retried is the NULL procedure

   Since reliable transports, such as TCP, do not always synchronously
   inform a peer when the other peer has broken the connection (for
   example, when an NFS server reboots), the NFSv4 client may want to
   actively "probe" the connection to see if has been broken.  Use of
   the NULL procedure is one recommended way to do so.  So, when a
   client experiences a remote procedure call timeout (of some arbitrary
   implementation-specific amount), rather than retrying the remote
   procedure call, it could instead issue a NULL procedure call to the
   server.  If the server has died, the transport connection break will
   eventually be indicated to the NFSv4 client.  The client can then
   reconnect, and then retry the original request.  If the NULL
   procedure call gets a response, the connection has not broken.  The
   client can decide to wait longer for the original request's response,
   or it can break the transport connection and reconnect before
   re-sending the original request.

   For callbacks from the server to the client, the same rules apply,
   but the server doing the callback becomes the client, and the client
   receiving the callback becomes the server.

I reproduced it by slowing down the WRITE operation at the NFSv4 server - I just added delay(SEC_TO_TICK(8)); into the rfs4_op_write() function. Then I mounted the filesystem using the vers=4,timeo=2,forcedirectio mount options, finally I ran this command:

# dd if=/dev/zero of=d bs=1 count=1

Here is the communication between the NFSv4 client and the NFSv4 server:

# snoop -ta -r -d e1000g1
Using device e1000g1 (promiscuous mode)
17:07:33.56060 ->   NFS C 4 (lookup valid) PUTFH FH=7E65 NVERIFY GETATTR 10011a b0a23a ACCESS rd,lk,mo,ext,dl LOOKUP d GETFH GETATTR 10011a ...
17:07:33.56076 ->   NFS R 4 (lookup valid) NFS4ERR_SAME PUTFH NFS4_OK NVERIFY NFS4ERR_SAME 
17:07:33.56108 ->   NFS C 4 (getattr     ) PUTFH FH=7E65 GETATTR 10111a b0a23a 
17:07:33.56119 ->   NFS R 4 (getattr     ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK 
17:07:33.56148 ->   NFS C 4 (lookup      ) PUTFH FH=7E65 SAVEFH LOOKUP d GETFH GETATTR 10011a b0a23a RESTOREFH NVERIFY GETATTR 10011a b0a23a...
17:07:33.56185 ->   NFS C 4 (open        ) PUTFH FH=7E65 SAVEFH OPEN d OT=CR(U) SQ=10 CT=N AC=W DN=N OO=0048 GETFH GETATTR 10011a b0a23a RES...
17:07:33.56295 ->   NFS R 4 (open        ) NFS4_OK PUTFH NFS4_OK SAVEFH NFS4_OK OPEN NFS4_OK ST=141B:1 RF=CF,PL DT=N GETFH NFS4_OK FH=882D GETATTR N...
17:07:33.56329 ->   NFS C 4 (open_confirm) PUTFH FH=882D OPEN_CONFIRM SQ=11 OST=141B:1 
17:07:33.56335 ->   NFS R 4 (open_confirm) NFS4_OK PUTFH NFS4_OK OPEN_CONFIRM NFS4_OK OST=141B:2 
17:07:33.56366 ->   NFS C 4 (access      ) PUTFH FH=882D ACCESS rd,mo,ext,exc 
17:07:33.56373 ->   NFS R 4 (access      ) NFS4_OK PUTFH NFS4_OK ACCESS NFS4_OK Supp=rd,mo,ext,exc Allow=rd,mo,ext 
17:07:33.56402 ->   NFS C 4 (delegreturn ) PUTFH FH=882D GETATTR 10011a b0a23a DELEGRETURN DST=1A28:0 
17:07:33.56409 ->   NFS R 4 (delegreturn ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK DELEGRETURN NFS4_OK 
17:07:33.56444 ->   NFS C 4 (write       ) PUTFH FH=882D WRITE ST=141B:2 at 0 for 1 
17:07:33.63616 ->   TCP D=1014 S=2049 Ack=1915632030 Seq=3953912707 Len=0 Win=32806 Options=<nop,nop,tstamp 6752079 7258513>
17:07:34.96837 ->   NFS C 4 (write       ) PUTFH FH=882D WRITE ST=141B:2 at 0 for 1  (retransmit)
17:07:35.03675 ->   TCP D=1014 S=2049 Ack=1915632250 Seq=3953912707 Len=0 Win=32806 Options=<nop,nop,tstamp 6752219 7258653>
17:07:39.76242 ->   NFS C 4 (write       ) PUTFH FH=882D WRITE ST=141B:2 at 0 for 1  (retransmit)
17:07:39.83070 ->   TCP D=1014 S=2049 Ack=1915632470 Seq=3953912707 Len=0 Win=32806 Options=<nop,nop,tstamp 6752699 7259133>
17:07:41.56473 ->   NFS R 4 (write       ) NFS4_OK PUTFH NFS4_OK WRITE NFS4_OK 1 (FSYNC) 
17:07:41.56657 ->   NFS C 4 (close       ) PUTFH FH=882D GETATTR 10011a b0a23a CLOSE SQ=12 OST=141B:2 
17:07:41.56696 ->   NFS R 4 (close       ) NFS4_OK PUTFH NFS4_OK GETATTR NFS4_OK CLOSE OST=141B:3 
17:07:41.63347 ->   TCP D=2049 S=1014 Ack=3953913035 Seq=1915632690 Len=0 Win=32806 Options=<nop,nop,tstamp 7259320 6752872>

We can see that the client retransmits the WRITE request without breaking the transport connection and thus violates the RFC.

Actions #1

Updated by Marcel Telka almost 9 years ago

I tested similar scenario with linux (CentOS 6.3) and linux works correctly (doesn't retransmit on the same connection). After the specified timeo linux closed the connection.

Actions #2

Updated by Marcel Telka about 5 years ago

  • Description updated (diff)

Also available in: Atom PDF