nfsauth_cache_get() could spend a lot of time walking exi_cache
The exi_cache is a part of the exportinfo structure. It holds the auth_cache entries. These entries allows the NFS server to cache the auth information and to avoid the expensive door calls to mountd in userland. The auth information (and thus the exi_cache) is consulted on every (or almost every, to be exact) incoming NFS request.
The exi_cache is organized as a hash table with AUTH_TABLESIZE (32) buckets. The key for the hash function is the address of the NFS client. The collisions are handled with linked lists (via auth_next).
The nfsauth_cache_get() function is responsible for finding the particular auth_cache entry in the exi_cache.
Traditionally, this design works well in a case the load factor of the hash table (IOW, the number of entries in the hash table) is low, but the exi_cache might become very slow with high hash table load factor.
During testing in lab I found a case where we had more than 20k entries in the exi_cache. The load factor in this case was more than 600, so in worst case the nfsauth_cache_get() had to scan the linked list of >600 entries to find the entry. In this particular test run more than 7% of the CPU time was spent in the nfsauth_cache_get(). This is visible in the flame graph (attached as lab.svg).
Based on the information above I constructed a special test case to exhibit the problem even more clearly. When I ran the test locally (both NFS client and the NFS server were running on the same machine), the nfsd was consuming about 60% of CPU, while the NFS client was at about 30% to 35% of CPU. The time spent (and wasted) in nfsauth_cache_get() was more than 44%. See the attached testcase.svg flame graph for details.
To reproduce the problem follow these steps:
- Make sure you have a machine with #4943 implemented
- Run share /tmp
- Compile the attached nfsauth_perf.c file: gcc -Wall -o nfsauth_perf -lnsl -lrpcsvc nfsauth_perf.c
- Run the test: ./nfsauth_perf 1000 100
With the fix and with the same testcase (./nfsauth_perf 1000 100) the time spent in nfsauth_cache_get() is about 4% only (compare to about 44% without the fix) - see attached testcase-withfix.svg. The NFS was able to execute about 14 million operations with the fix (compared to about 6.5 million without the fix on the same hardware).
The fix replaces the linked list with two-level AVL trees. The first level of AVL contains auth_cache_clnt structures. This represents the NFS clients (based on their network address). The auth_cache_clnt contains another AVL tree (authc_tree) with the auth_cache structures. The authc_tree AVL tree is searched by auth_flavor, auth_clnt_uid, and auth_clnt_gid (see nfsauth_cache_compar() for details).
To find the particular auth_cache entry (structure) we will first search the exportinfo->exi_cache AVL tree with the client's address to find the auth_cache_clnt structure, then we will search the auth_cache_clnt->authc_tree AVL tree with auth_flavor, auth_clnt_uid, and auth_clnt_gid values.
Updated by Electric Monk about 5 years ago
- Status changed from Pending RTI to Closed
- % Done changed from 0 to 100
commit 71da0c32412fe8362672d1dba2fa4c6b061445b6 Author: Marcel Telka <firstname.lastname@example.org> Date: 2015-01-20T22:52:29.000Z 5509 nfsauth_cache_get() could spend a lot of time walking exi_cache Reviewed by: Gordon Ross <email@example.com> Reviewed by: Robert Mustacchi <firstname.lastname@example.org> Approved by: Dan McDonald <email@example.com>