NLM server holds vnodes too long
The NLM server creates a nlm_vhold structure for every vnode it touches to make sure the vnode is properly held when there are any active locks for the particular vnode. There one nlm_vhold structure allocated for every host/vnode pair, even in a case the number of locks on the particular vnode is higher than one. The nlm_vhold structure is not used to track all of the locks, it's purpose is just to mark a vnode as "touched" and held for the particular host.
Once a lock is released for the particular vnode, the NLM implementation does not free the nlm_vhold structure, since we are not sure whether this lock is the last one (and the vnode is no longer locked by the particular host), or not. To determine this, some expensive tests are needed, so we defer such tests to the nlm_gc() - the NLM garbage collector.
In the nlm_gc() nlm_vhold structures are scanned to see whether they are still needed or not (if not, they are freed). The problem is that the nlm_gc() checks nlm_vhold structs only for some idle hosts. The non-idle hosts are rarely checked for unused nlm_vhold structs and such vholds are never (or too late) released. This causes the particular vnodes are held for too long time, even the files are removed.
Steps to reproduce the bug
On the NFS server create a dataset with a big file and share it:
# zfs create rpool/export/TEST # zfs set sharenfs=on rpool/export/TEST # mkfile 100m /export/TEST/bigfile # chmod 777 /export/TEST/bigfile # chmod 777 /export/TEST # ls -la /export/TEST total 102416 drwxrwxrwx 2 root root 3 aug 13 19:03 . drwxr-xr-x 4 root sys 4 aug 13 19:03 .. -rwxrwxrwx 1 root root 104857600 aug 13 19:03 bigfile # zfs list rpool/export/TEST NAME USED AVAIL REFER MOUNTPOINT rpool/export/TEST 100M 52,9G 100M /export/TEST #
On the NFS client do the following (use the attached lock.c compiled using "gcc -Wall -o lock lock.c"):
# mount -o vers=3 SERVER:/export/TEST /mnt # while true ; do ./lock lock r 0 0 /mnt/testfile ; sleep 5 ; done &  3744 # ./lock lock r 0 0 /mnt/bigfile # rm /mnt/bigfile #
Now, look back at the server:
# ls -la /export/TEST total 2 drwxrwxrwx 2 root root 3 aug 13 19:09 . drwxr-xr-x 4 root sys 4 aug 13 19:03 .. -rwxr-xr-x 1 nobody nobody 0 aug 13 19:08 testfile # zfs list rpool/export/TEST NAME USED AVAIL REFER MOUNTPOINT rpool/export/TEST 100M 52,9G 100M /export/TEST #
And you'll see that the space occupied by the now-deleted bigfile is never reclaimed.
To see the space reclaimed you need to kill the while loop at the NFS client (please note the while loop is not touching the deleted bigfile, but another small testfile). The other possible way how to see the space reclaimed is to restart the nlockmgr service on the NFS server.