Project

General

Profile

Actions

Bug #5436

closed

Panic due to bad mutex, from auth_cache being previously freed

Added by Marcel Telka almost 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
nfs - NFS server and client
Start date:
2014-12-15
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

With #5296 integrated we encountered the following panic:

panic[cpu12]/thread=ffffff227422a8c0: 
mutex_enter: bad mutex, lp=ffffff23621060a8 owner=ffffff227ae4eb60 thread=ffffff227422a8c0 

ffffff00f8061520 unix:mutex_panic+73 () 
ffffff00f8061590 unix:mutex_vector_enter+367 () 
ffffff00f80616b0 nfssrv:nfsauth_cache_get+1a6 () 
ffffff00f8061770 nfssrv:nfsauth_access+1b0 () 
ffffff00f8061860 nfssrv:checkauth+15e () 
ffffff00f8061b80 nfssrv:common_dispatch+28d () 
ffffff00f8061ba0 nfssrv:rfs_dispatch+2d () 
ffffff00f8061c80 rpcmod:svc_getreq+1c1 () 
ffffff00f8061cf0 rpcmod:svc_run+e0 () 
ffffff00f8061d30 rpcmod:svc_do_run+8e () 
ffffff00f8061e20 nfs:nfssys+f1 () 
ffffff00f8061ec0 genunix:dtrace_systrace_syscall32+e4 () 
ffffff00f8061f10 unix:brand_sys_sysenter+1c9 () 

syncing file systems... 
done 
dumping to /dev/zvol/dsk/syspool/dump, offset 65536, content: kernel + curproc 

The problem is that some auth_cache entries are left in the exi_cache, even they are marked as invalid (NFS_AUTH_INVALID). Such entries causes two types of problems (depending on the time when the entry is marked as INVALID):

  1. The entries are later freed, but left in exi_cache. Those entries will cause panices as we saw above.
  2. The entries are lingering (almost) forever in the exi_cache. Such entries won't be refreshed and the server will use the outdated information (I saw such entries in the crash dump file).

The problem is here in nfsauth_cache_get():

838            auth_state_t prev_state = p->auth_state;
839
840            p->auth_state = NFS_AUTH_INVALID;
841            mutex_exit(&p->auth_lock);
842
843            if (prev_state == NFS_AUTH_FRESH) {
844                if (rw_tryupgrade(&exi->exi_cache_lock) == 0) {

If the previous state of the auth_cache entry was other than FRESH (for example STALE, or REFRESHING), we won't remove the entry from the exi_cache.


Related issues

Related to illumos gate - Feature #5296: Support for more than 16 groups with AUTH_SYSClosedMarcel Telka2014-11-07

Actions
Actions

Also available in: Atom PDF