Project

General

Profile

Bug #5436

Panic due to bad mutex, from auth_cache being previously freed

Added by Marcel Telka almost 6 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
nfs - NFS server and client
Start date:
2014-12-15
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

With #5296 integrated we encountered the following panic:

panic[cpu12]/thread=ffffff227422a8c0: 
mutex_enter: bad mutex, lp=ffffff23621060a8 owner=ffffff227ae4eb60 thread=ffffff227422a8c0 

ffffff00f8061520 unix:mutex_panic+73 () 
ffffff00f8061590 unix:mutex_vector_enter+367 () 
ffffff00f80616b0 nfssrv:nfsauth_cache_get+1a6 () 
ffffff00f8061770 nfssrv:nfsauth_access+1b0 () 
ffffff00f8061860 nfssrv:checkauth+15e () 
ffffff00f8061b80 nfssrv:common_dispatch+28d () 
ffffff00f8061ba0 nfssrv:rfs_dispatch+2d () 
ffffff00f8061c80 rpcmod:svc_getreq+1c1 () 
ffffff00f8061cf0 rpcmod:svc_run+e0 () 
ffffff00f8061d30 rpcmod:svc_do_run+8e () 
ffffff00f8061e20 nfs:nfssys+f1 () 
ffffff00f8061ec0 genunix:dtrace_systrace_syscall32+e4 () 
ffffff00f8061f10 unix:brand_sys_sysenter+1c9 () 

syncing file systems... 
done 
dumping to /dev/zvol/dsk/syspool/dump, offset 65536, content: kernel + curproc 

The problem is that some auth_cache entries are left in the exi_cache, even they are marked as invalid (NFS_AUTH_INVALID). Such entries causes two types of problems (depending on the time when the entry is marked as INVALID):

  1. The entries are later freed, but left in exi_cache. Those entries will cause panices as we saw above.
  2. The entries are lingering (almost) forever in the exi_cache. Such entries won't be refreshed and the server will use the outdated information (I saw such entries in the crash dump file).

The problem is here in nfsauth_cache_get():

838            auth_state_t prev_state = p->auth_state;
839
840            p->auth_state = NFS_AUTH_INVALID;
841            mutex_exit(&p->auth_lock);
842
843            if (prev_state == NFS_AUTH_FRESH) {
844                if (rw_tryupgrade(&exi->exi_cache_lock) == 0) {

If the previous state of the auth_cache entry was other than FRESH (for example STALE, or REFRESHING), we won't remove the entry from the exi_cache.


Related issues

Related to illumos gate - Feature #5296: Support for more than 16 groups with AUTH_SYSClosedMarcel Telka2014-11-07

Actions
#1

Updated by Marcel Telka almost 6 years ago

  • Status changed from In Progress to Pending RTI
#2

Updated by Electric Monk almost 6 years ago

  • Status changed from Pending RTI to Closed
  • % Done changed from 0 to 100

git commit 3d1d816f0f1e405a11a5871fb93eef11577c5fc8

commit  3d1d816f0f1e405a11a5871fb93eef11577c5fc8
Author: Marcel Telka <marcel.telka@nexenta.com>
Date:   2014-12-16T16:11:15.000Z

    5436 Panic due to bad mutex, from auth_cache being previously freed
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Approved by: Dan McDonald <danmcd@omniti.com>

Also available in: Atom PDF