Bug #3368
closedNFSv4 and slow OS X Mountain Lion (10.8.2) Client
0%
Description
Server:
--OpenIndiana oi_151a5
--ZFS pool shared over NFS
--LDAP setup for users and groups (working)
Client:
--OS X Mountain Lion 10.8.2
--LDAP setup for users and groups (working)
Linux clients are able to access mounts using both Kerberos, and sec=sys.
Mac Clients are able to mount, but the mount command take ~30sec to return. Afterwards simple commands like ls on a directory with 4 folders and 2 files takes over a min, and returns line-by-line with all files owned by nobody, uid and gid both show as 4294967294.
The Macs are able to mount NFSv4 from a linux server (CentOS 6.3) without issue.
Taking ZFS out of the mix changes nothing.
We opened an Enterprise support ticket with Apple (case #373004549), who has confirmed the issue, and are escalating it through their engineering department, for a possible work around.
They however are adamant the issue is with the NFS server in OI since it works fine with Linux , OS X and BSD servers (exact same export options).
Attached are opendirectoryd logs and a packet dump from the client while mounting and listing the share.
On the server, the directory in question looks like:
-bash-4.0$ sudo chown jmullen:csstaff matlab
-bash-4.0$ ls -lahn
total 41986013
drwxrwxrwx+ 5 0 0 7 Nov 7 16:35 .
drwxr-xr-x+ 15 0 0 15 Nov 7 08:47 ..
drwxrwxrwx+ 3 0 0 5 Oct 8 10:34 HDL_Designer
drwxrwxrwx+ 2 0 0 7 Oct 17 17:00 ModelSim
-rwxrwxrwx+ 1 0 0 1.1G Oct 3 16:57 cs_sandbox.iso.old
drwxrwxrwx+ 2 20154 291400163 4 Nov 7 16:35 matlab
-rwxrwxrwx+ 1 0 0 19G Sep 26 11:06 windows7.tar.gz
On the Mac you can see the LDAP user and group:
reddy:~ osadmin$ id jmullen
uid=20154(jmullen) gid=20(staff) groups=20(staff),70(symbiotic),5035(sacm),5055(sacm-admin),5057(wphost),5063(cses6),5064(mobilemanip2012),5065(dataminingsp12),5066(csadmin),291400000(admins),291400163(csstaff),12(everyone),62(netaccounts)
But then for some reason when you ls the directory again you get:
reddy:~ osadmin$ ls -lahn /private/tmp/mnt
total 41986008
drwxrwxrwx 5 4294967294 4294967294 7B Nov 7 16:35 .
drwxrwxrwt 5 0 0 170B Nov 14 10:25 ..
drwxrwxrwx 3 4294967294 4294967294 5B Oct 8 10:34 HDL_Designer
drwxrwxrwx 2 4294967294 4294967294 7B Oct 17 17:00 ModelSim
-rwxrwxrwx 1 4294967294 4294967294 1.1G Oct 3 16:57 cs_sandbox.iso.old
drwxrwxrwx 2 4294967294 4294967294 4B Nov 7 16:35 matlab
-rwxrwxrwx 1 4294967294 4294967294 19G Sep 26 11:06 windows7.tar.gz
<TERMINAL 1>
reddy:~ osadmin$ uname -a
Darwin reddy.cs.ou.edu 12.2.0 Darwin Kernel Version 12.2.0: Sat Aug 25 00:48:52 PDT 2012; root:xnu-2050.18.24~1/RELEASE_X86_64 x86_64
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
options=2b<RXCSUM,TXCSUM,VLAN_HWTAGGING,TSO4>
ether 00:23:32:9f:16:ca
inet6 fe80::223:32ff:fe9f:16ca%en0 prefixlen 64 scopeid 0x4
inet 129.15.78.217 netmask 0xffffff00 broadcast 129.15.78.255
media: autoselect (1000baseT <full-duplex,flow-control>)
reddy:~ osadmin$ showmount -e pearl.cs.ou.edu | grep install_files
/tank/install_files @129.15.78.0/24 @10.0.0.0/8
reddy:~ osadmin$ odutil set log debug
reddy:~ osadmin$ sudo tcpdump -vvv -K -n -s 0 -i en0 -w /Library/Logs/373004549.pcap
Password:
tcpdump: listening on en0, link-type EN10MB (Ethernet), capture size 65535 bytes
^C1945 packets captured
1967 packets received by filter
0 packets dropped by kernel
reddy:~ osadmin$ odutil set log default
</TERMINAL 1>
<TERMINAL 2>
reddy:~ osadmin$ sudo mkdir /private/tmp/mnt
reddy:~ osadmin$ sudo mount_nfs -o vers=4 pearl.cs.ou.edu:/tank/install_files /private/tmp/mnt
reddy:~ osadmin$ ls -lahn /private/tmp/mnt
total 41986008
drwxrwxrwx 5 0 4294967294 7B Nov 7 16:35 .
drwxrwxrwt 7 0 0 238B Nov 14 10:21 ..
drwxrwxrwx 3 0 4294967294 5B Oct 8 10:34 HDL_Designer
drwxrwxrwx 2 0 4294967294 7B Oct 17 17:00 ModelSim
-rwxrwxrwx 1 0 4294967294 1.1G Oct 3 16:57 cs_sandbox.iso.old
drwxrwxrwx 2 4294967294 4294967294 4B Nov 7 16:35 matlab
-rwxrwxrwx 1 0 4294967294 19G Sep 26 11:06 windows7.tar.gz
</TERMINAL 2>
Files
Updated by Jonathan Mullen over 10 years ago
UPDATE (not a good one):
I have been getting more emails from other users (read LARGE federal and research institutions) suffering this issue, who apparently found me through this bug report. So I though this would be an appropriate place to put an update.
An enterprise support ticket was created with Apple. I spend about a week going back and forth with their engineers. They had their own OpenIndiana server setup, and were able to replicate the issue. I was told it is some compatibility issues with Apple's OpenDirectory and the Solaris NFS/idmap implementation, OS X mounts NFSv4 kerberos shares from Linux without an issue.
It got kicked up to the product engineers, who came back with the response that it would be addressed in SOME future release, they still would not say explicitly if it was on OpenDirectory bug, or not. OpenDirectory is obviously capable since it works with a Linux (and I presume BSD) kerberized NFS server.
As of 10.8.3, the issue was still not resolved. We have pretty much walked away from this after a few months of head banging, as we have a functional NFSv3 and autofs solution. IMHO, the shame for this falls on Apple, but I'll buy anyone, including Tim Cook a beer/wine/whatever if they can point me in the direction of a solution for this.
If you are suffering from this issue, please file a report with Apple, and reference our Apple case number (#373004549). As far as I know, we were the first to bring this issue to Apple. I'd like to think that if enough people speak up, Apple might spare an engineer and fix this bug.
Updated by Marcel Telka over 10 years ago
- Status changed from New to Feedback
Looking into the 373004549_apple_support.tar.gz file...
Assuming the 129.15.78.217 is the MacOS NFS client and 129.15.78.156 is the OI NFS server.
$ tshark -r Library/Logs/373004549.pcap "tcp.port == 2049" ... snip ... 1271 141.710615 129.15.78.217 -> 129.15.78.156 NFS V4 COMPOUND Call getattr PUTFH;GETATTR 1272 141.710918 129.15.78.156 -> 129.15.78.217 NFS V4 COMPOUND Reply (Call In 1271) getattr PUTFH;GETATTR 1273 141.710964 129.15.78.217 -> 129.15.78.156 TCP 49158 > nfs [ACK] Seq=3257 Ack=4333 Win=130832 Len=0 TSV=781420549 TSER=346516639 1279 143.213881 129.15.78.217 -> 129.15.78.156 NFS V4 COMPOUND Call statfs PUTFH;GETATTR 1280 143.214151 129.15.78.156 -> 129.15.78.217 NFS V4 COMPOUND Reply (Call In 1279) statfs PUTFH;GETATTR 1281 143.214191 129.15.78.217 -> 129.15.78.156 TCP 49158 > nfs [ACK] Seq=3429 Ack=4625 Win=130784 Len=0 TSV=781422044 TSER=346516790 1283 144.716503 129.15.78.217 -> 129.15.78.156 NFS V4 COMPOUND Call getattr PUTFH;GETATTR 1284 144.716736 129.15.78.156 -> 129.15.78.217 NFS V4 COMPOUND Reply (Call In 1283) getattr PUTFH;GETATTR 1285 144.716777 129.15.78.217 -> 129.15.78.156 TCP 49158 > nfs [ACK] Seq=3601 Ack=4869 Win=130832 Len=0 TSV=781423540 TSER=346516940 1289 146.218500 129.15.78.217 -> 129.15.78.156 NFS V4 COMPOUND Call readdir PUTFH;GETATTR;READDIR 1290 146.242739 129.15.78.156 -> 129.15.78.217 NFS V4 COMPOUND Reply (Call In 1289) readdir PUTFH;GETATTR;READDIR 1291 146.242775 129.15.78.217 -> 129.15.78.156 TCP 49158 > nfs [ACK] Seq=3813 Ack=5481 Win=130464 Len=0 TSV=781425061 TSER=346517093 ... snip ... 1592 192.357586 129.15.78.217 -> 129.15.78.156 NFS V4 COMPOUND Call renew RENEW 1593 192.357865 129.15.78.156 -> 129.15.78.217 NFS V4 COMPOUND Reply (Call In 1592) renew RENEW 1594 192.357901 129.15.78.217 -> 129.15.78.156 TCP 49158 > nfs [ACK] Seq=10833 Ack=14253 Win=131008 Len=0 TSV=781470933 TSER=346521704 1596 193.358768 129.15.78.217 -> 129.15.78.156 NFS V4 COMPOUND Call renew RENEW 1597 193.358967 129.15.78.156 -> 129.15.78.217 NFS V4 COMPOUND Reply (Call In 1596) renew RENEW 1598 193.359013 129.15.78.217 -> 129.15.78.156 TCP 49158 > nfs [ACK] Seq=10981 Ack=14313 Win=131008 Len=0 TSV=781471929 TSER=346521804 1603 194.027646 129.15.78.217 -> 129.15.78.156 NFS V4 COMPOUND Call openattr PUTFH;OPENATTR;GETATTR 1604 194.027904 129.15.78.156 -> 129.15.78.217 NFS V4 COMPOUND Reply (Call In 1603) openattr PUTFH;OPENATTR;GETATTR 1605 194.027943 129.15.78.217 -> 129.15.78.156 TCP 49158 > nfs [ACK] Seq=11161 Ack=14653 Win=130736 Len=0 TSV=781472596 TSER=346521871 ... snip ...
In the first block above we can see that the NFS server replied to every NFSv4 request immediately. Then there was a gap for few seconds and then the NFSv4 client issued another NFSv4 operation. And again, the server replied immediately. Apparently, the slow down is caused by the NFSv4 client.
I am afraid we cannot do anything with it now at the NFSv4 server side. Surely, it is possible that the client's slow turnaround is caused by some NFS server action, but we need an indication what it is. Once we know that we can try to fix it.
In the second block above we see the RENEW operation sent from the client every second. It is strange. I see no reason why the client wants to call RENEW so often. Usually, the gap between RENEWs is ca 40 seconds (depending on various things). So I'd say this is another bug at the NFSv4 client side.
If something is broken at the NFSv4 (OpenIndiana) side, we need to know from the Apple what exactly they see broken in the pcap file. (Un)fortunately, I am unable to find any evidence the NFSv4 server is doing something wrong.
Updated by Marcel Telka over 10 years ago
- Project changed from OpenIndiana Distribution to illumos gate
Updated by Marcel Telka over 10 years ago
- Subject changed from NFSv4 and OS X Mountain Lion (10.8.2) Client to NFSv4 and slow OS X Mountain Lion (10.8.2) Client
Updated by Jonathan Mullen over 10 years ago
Marcel, I just wanted to say thanks for looking into this from the OI side.
We no longer have an active ticket with Apple, but my understanding is they are still tracking this as a bug internally. So I will point them at the ticket, perhaps they will provide more information to aid your search from the OI side.
Thanks again for taking the time to look at this.
Updated by Marcel Telka over 10 years ago
- Category set to nfs - NFS server and client
- Status changed from Feedback to Closed
Okay. I'm closing this. Once you have some info showing what can we do from the OI/illumos side then please reopen this bug (or file a new one).
Thanks.
Updated by Jean EYMERIT about 10 years ago
Marcel Telka wrote:
Okay. I'm closing this. Once you have some info showing what can we do from the OI/illumos side then please reopen this bug (or file a new one).
Thanks.
Hello,
Do you have some news/updates since 4 months ?
Updated by Steffen Moser almost 10 years ago
Hello,
I'd also like to ask if anybody has new information about this Apple problem... I know that is not caused by OI/Illumos/Solaris (by the way - we experience it also with a Solaris 11.1 (x86-64) server and Mountain Lion Clients) as only our Apple clients seem to be affected. Our Linux clients don't show any problems.
Actually, we can observe two problems: NFSv4 access is unusable slow. But also the NFSv4 ID-Mapping fails totally. At first, the problem of the slowness seemed to have been solved after setting the NFSV4DOMAINNAME in Mountain Lion by using the "dscl" command (as given in "man opendirectoryd"). But it has been reoccurring in further tests from time to time - without any reproducible pattern. I never got IDMAP to work. The files/directories always seem to belong to "nobody".
I tried to write that down in [1], but I haven't got any replies.
We also contacted Apple and asked about Bug ID# 13630191. They told me that it is still under investigation and there is no new information. It would be quite interesting to know if the bug will be still present in the upcoming release of OS X (OS X 10.9 - Mavericks).
Kind regards,
Steffen
[1] https://discussions.apple.com/thread/4931123?start=0&tstart=0