Locking the same file over NFS from a Linux client by multiple processes could cause 30 seconds delay accumulatively if the NFS server machine has more than one IPv4 interface.
This issue happens between a Linux client and Solaris/SmartOS/OmniOS server, even when Sun's proprietary lockd is used. A MAC OS client works well with Solaris/SmartOS/OmniOS server. I am just wondering if anything can be done now since we are using open source nlockmgr.
Here is how I reproduced the issue:
A SmartOS server has the following setting:
hostname-nfs0 with IP address 172.30.192.194
hostname-nfs1 with IP address 172.30.192.191
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
aggr0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 172.30.192.220 netmask ffffff00 broadcast 172.30.192.255
admin0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> metric 2 mtu 1500 index 3
inet 172.30.192.194 netmask ffffff00 broadcast 172.30.192.255
admin1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> metric 2 mtu 1500 index 4
inet 172.30.192.191 netmask ffffff00 broadcast 172.30.192.255
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
On a Linux host (with IP address 172.28.146.50):
/mnt/Atnas4 is mounted to hostname-nfs0:/vmgr/Atnas4
/mnt/Atnas5 is mounted to hostname-nfs1:/vmgr/Atnas5
/mnt/Atnas4/scratch/file and /mnt/Atnas5/scratch/file are existing files writable by the user.
Then on the Linux host run the following commands (perl script "lockfile.pl" is attached):
perl lockfile.pl 3 /mnt/Atnas4/scratch/file 2
INFO: Successfully forked 8316
INFO: Successfully forked 8317
INFO: Successfully forked 8318
INFO: pid 8318 successfully locked file 1091 times in 2 seconds
INFO: pid 8317 successfully locked file 1116 times in 2 seconds
INFO: pid 8316 successfully locked file 1121 times in 2 seconds
perl lockfile.pl 3 /mnt/Atnas5/scratch/file 2
INFO: Successfully forked 8411
INFO: Successfully forked 8412
INFO: Successfully forked 8413
INFO: pid 8411 successfully locked file 1 times in 30 seconds
INFO: pid 8412 successfully locked file 1 times in 60 seconds
INFO: pid 8413 successfully locked file 2 times in 90 seconds
Attached tcpdump file (atnas5.pcap) has the packets related to the last command. Frame 1 through 94 are all between the Linux host (172.28.146.50) and hostname-nfs1 (172.30.192.191), but starting frame No. 95, hostname-nfs0 (172.30.192.194) is involved. I remember there's a Linux bug filed for this issue (sorry I cannot find it now) but Linux developer refused to fix it stating that the Solaris/SmartOS server should not switch to another IP to communicate with the Linux client, fixing it can cause security problem.
Updated by Youzhong Yang almost 10 years ago
related linux bug report:
Updated by Chip Schweiss almost 10 years ago
This bug became a showstopper for my NFSv3 clients. Users who's home directories were mounted over NFSv3 could not log in more than once with csh as the .history file would get caught up in this.
If have been forced to move my clients to NFSv4.