Project

General

Profile

Actions

Bug #7282

closed

nlm_host leaks by nlm_host_findcreate() callers leads to hang in nlm_svc_stopping()

Added by Marcel Telka almost 5 years ago. Updated almost 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
nfs - NFS server and client
Start date:
2016-08-08
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

Some nlm_host_findcreate() callers does not release the returned nlm_host structure by calling nlm_host_release() on it. This leads to the reference counter leak (nh_refs) and eventually to a hang in nlm_svc_stopping().

Such leaks are:
  • at line 798 here in nlm_do_cancel():
    788    g = zone_getspecific(nlm_zone_key, curzone);
    789    host = nlm_host_findcreate(g, name, netid, addr);
    790    if (host == NULL) {
    791        resp->stat.stat = nlm4_denied_nolocks;
    792        return;
    793    }
    794    if (cb != NULL) {
    795        error = nlm_host_get_rpc(host, sr->rq_vers, &rpcp);
    796        if (error != 0) {
    797            resp->stat.stat = nlm4_denied_nolocks;
    798            return;
    799        }
    800    }
    
  • at line 2306 in nfs4_close() where the nlm_host_findcreate() is called via nfs4_find_sysid() and lm_get_sysid():
    2297        struct lm_sysid *lmsid;
    2298        lmsid = nfs4_find_sysid(VTOMI4(vp));
    2299        if (lmsid == NULL) {
    2300            DTRACE_PROBE2(unknown__sysid, int, e.error,
    2301                vnode_t *, vp);
    2302        } else {
    2303            cleanlocks(vp, ttoproc(curthread)->p_pid,
    2304                (lm_sysidt(lmsid) | LM_SYSID_CLIENT));
    2305        }
    2306        return (e.error);
    
  • in nlm_register_lock_locally() in a case the nlm_host_findcreate() at line 635 is called:
    617void
    618nlm_register_lock_locally(struct vnode *vp, struct nlm_host *hostp,
    619    struct flock64 *flk, int flags, u_offset_t offset)
    620{
    621    int sysid = 0;
    622
    623    if (hostp == NULL) {
    624        mntinfo_t *mi;
    625        servinfo_t *sv;
    626        const char *netid;
    627        struct nlm_globals *g;
    628
    629        mi = VTOMI(vp);
    630        sv = mi->mi_curr_serv;
    631        netid = nlm_knc_to_netid(sv->sv_knconf);
    632
    633        if (netid != NULL) {
    634            g = zone_getspecific(nlm_zone_key, curzone);
    635            hostp = nlm_host_findcreate(g, sv->sv_hostname,
    636                netid, &sv->sv_addr);
    637        }
    638    }
    639
    640    if (hostp != NULL) {
    641        sysid = hostp->nh_sysid | LM_SYSID_CLIENT;
    642    }
    643
    644    flk->l_sysid = sysid;
    645    (void) convoff(vp, flk, 0, (offset_t)offset);
    646    (void) nlm_local_setlk(vp, flk, flags);
    647}
    

Steps to reproduce

Note: These steps reproduces the hang via the leak in nlm_register_lock_locally().

  1. Mount something using NFSv3:
    # mount -o vers=3 SERVER:/export /mnt
    
  2. Lock some file over NFSv3 and leave it locked. You can use the attached locker.c, for example:
    # ./locker
    > lock W 0 0 /mnt/test
    >
    
  3. Unmount the mounted filesystem forcibly:
    # umount -f /mnt
    
  4. Terminate the application holding the lock:
    # pkill locker
    
  5. Try to restart lockd:
    # svcadm restart nlockmgr
    
  6. Now you'll find that the system hung in nlm_svc_stopping():
    # svcs nlockmgr
    STATE          STIME    FMRI
    online*        12:27:55 svc:/network/nfs/nlockmgr:default
    # echo ::stacks -m klmmod | mdb -k
    THREAD           STATE    SOBJ                COUNT
    ffffff00c8631520 SLEEP    CV                      1
                     swtch+0x141
                     cv_wait+0x70
                     delay_common+0xb8
                     delay+0x30
                     nlm_svc_stopping+0x149
                     lm_shutdown+0x7b
                     nfssys+0x425
                     _sys_sysenter_post_swapgs+0x149
    
    #
    

Files

locker.c (3.57 KB) locker.c Marcel Telka, 2016-08-08 10:51 AM

Related issues

Related to illumos gate - Bug #4994: nlm_register_lock_locally() fails to unlock held locksClosedBryan Cantrill2014-07-11

Actions
Actions #1

Updated by Marcel Telka almost 5 years ago

  • Subject changed from nlm_host leaks by nlm_host_findcreate() callers leads to hangs in nlm_svc_stopping() to nlm_host leaks by nlm_host_findcreate() callers leads to hang in nlm_svc_stopping()
Actions #2

Updated by Marcel Telka almost 5 years ago

  • Related to Bug #4994: nlm_register_lock_locally() fails to unlock held locks added
Actions #3

Updated by Marcel Telka almost 5 years ago

  • Description updated (diff)
Actions #4

Updated by Marcel Telka almost 5 years ago

  • Description updated (diff)
Actions #5

Updated by Marcel Telka almost 5 years ago

  • Description updated (diff)
Actions #6

Updated by Marcel Telka almost 5 years ago

  • Status changed from In Progress to Pending RTI
Actions #7

Updated by Electric Monk almost 5 years ago

  • Status changed from Pending RTI to Closed
  • % Done changed from 0 to 100

git commit 745caa4284ab4cda9ab810405f99417d7cc6e8e6

commit  745caa4284ab4cda9ab810405f99417d7cc6e8e6
Author: Marcel Telka <marcel@telka.sk>
Date:   2016-08-10T15:01:18.000Z

    7282 nlm_host leaks by nlm_host_findcreate() callers leads to hang in nlm_svc_stopping()
    Reviewed by: Albert Lee <trisk@omniti.com>
    Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
    Approved by: Dan McDonald <danmcd@omniti.com>

Actions

Also available in: Atom PDF