bound+reserved ports can be leaked (when NFS client reboots too quickly)
We found that after a good while, fcntl locking over NFS would return 'no locks available' and noticed all reserved ports were bound on the server.
We traced this down to leaked entries in connection manager list maintained in clnt_cots.c (struct cm_xprt *cm_hd).
We reproduced with this scenario:
1. Linux DEBIAN7.4 client acquires and releases lock on file on server (via NFS).
2. reboot Linux client (but do so before MIR_CLNT_IDLE_TIMEOUT interval fires on server side).
3. when Linux client comes back up, repeat step 1.
At this point, a cm_entry with only the ORDREL flag set in x_state_flags will remain in the cm_entry linked list (cm_hd).
It appears that without at least a DEAD flag set in x_state_flags, this cm_entry will remain bound to the port...and will never be garbage collected.
We found two locations to add "cm_entry->x_dead = TRUE;" in clnt_cots.c which allowed these entries to be cleaned up and bound/reserved ports released.
(This was discovered about 3 years ago. In a recent discussion on illumos-zfs, this issue appears to have bitten again so we'd like to offer a fix )
Updated by Marcel Telka over 3 years ago
The problem happens when the NFS server is trying to connect back to the client's NLM service, but since the client rebooted in the meantime its NLM port changed, so the NFS server is unable to connect to the original client's NLM port and such connection is effectively dead. Due to this bug the connection is then leaked with the port in the bound state.
This is actually very likely the same scenario as seen in bug #1616.