Project

General

Profile

Bug #8955

bound+reserved ports can be leaked (when NFS client reboots too quickly)

Added by Ken Harris almost 3 years ago. Updated almost 3 years ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
nfs - NFS server and client
Start date:
2018-01-08
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

We found that after a good while, fcntl locking over NFS would return 'no locks available' and noticed all reserved ports were bound on the server.
We traced this down to leaked entries in connection manager list maintained in clnt_cots.c (struct cm_xprt *cm_hd).
We reproduced with this scenario:

1. Linux DEBIAN7.4 client acquires and releases lock on file on server (via NFS).
2. reboot Linux client (but do so before MIR_CLNT_IDLE_TIMEOUT interval fires on server side).
3. when Linux client comes back up, repeat step 1.

At this point, a cm_entry with only the ORDREL flag set in x_state_flags will remain in the cm_entry linked list (cm_hd).
It appears that without at least a DEAD flag set in x_state_flags, this cm_entry will remain bound to the port...and will never be garbage collected.

We found two locations to add "cm_entry->x_dead = TRUE;" in clnt_cots.c which allowed these entries to be cleaned up and bound/reserved ports released.

(This was discovered about 3 years ago. In a recent discussion on illumos-zfs, this issue appears to have bitten again so we'd like to offer a fix )


Related issues

Related to illumos gate - Bug #1616: NFS client cannot reconnect to server after network connectivity lossNewMarcel Telka2011-10-08

Actions
#1

Updated by Marcel Telka almost 3 years ago

  • Description updated (diff)
  • Status changed from New to In Progress
  • Assignee changed from Marcel Telka to Ken Harris
#2

Updated by Marcel Telka almost 3 years ago

  • Related to Bug #1616: NFS client cannot reconnect to server after network connectivity loss added
#3

Updated by Marcel Telka almost 3 years ago

The problem happens when the NFS server is trying to connect back to the client's NLM service, but since the client rebooted in the meantime its NLM port changed, so the NFS server is unable to connect to the original client's NLM port and such connection is effectively dead. Due to this bug the connection is then leaked with the port in the bound state.

This is actually very likely the same scenario as seen in bug #1616.

#4

Updated by Marcel Telka almost 3 years ago

Note: bug #1616 is still reproducible with the proposed fix (see Review Board) for this bug.

Also available in: Atom PDF