Bug #5436
Panic due to bad mutex, from auth_cache being previously freed
Start date:
2014-12-15
Due date:
% Done:
100%
Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:
Description
With #5296 integrated we encountered the following panic:
panic[cpu12]/thread=ffffff227422a8c0: mutex_enter: bad mutex, lp=ffffff23621060a8 owner=ffffff227ae4eb60 thread=ffffff227422a8c0 ffffff00f8061520 unix:mutex_panic+73 () ffffff00f8061590 unix:mutex_vector_enter+367 () ffffff00f80616b0 nfssrv:nfsauth_cache_get+1a6 () ffffff00f8061770 nfssrv:nfsauth_access+1b0 () ffffff00f8061860 nfssrv:checkauth+15e () ffffff00f8061b80 nfssrv:common_dispatch+28d () ffffff00f8061ba0 nfssrv:rfs_dispatch+2d () ffffff00f8061c80 rpcmod:svc_getreq+1c1 () ffffff00f8061cf0 rpcmod:svc_run+e0 () ffffff00f8061d30 rpcmod:svc_do_run+8e () ffffff00f8061e20 nfs:nfssys+f1 () ffffff00f8061ec0 genunix:dtrace_systrace_syscall32+e4 () ffffff00f8061f10 unix:brand_sys_sysenter+1c9 () syncing file systems... done dumping to /dev/zvol/dsk/syspool/dump, offset 65536, content: kernel + curproc
The problem is that some auth_cache entries are left in the exi_cache, even they are marked as invalid (NFS_AUTH_INVALID). Such entries causes two types of problems (depending on the time when the entry is marked as INVALID):
- The entries are later freed, but left in exi_cache. Those entries will cause panices as we saw above.
- The entries are lingering (almost) forever in the exi_cache. Such entries won't be refreshed and the server will use the outdated information (I saw such entries in the crash dump file).
The problem is here in nfsauth_cache_get():
838 auth_state_t prev_state = p->auth_state; 839 840 p->auth_state = NFS_AUTH_INVALID; 841 mutex_exit(&p->auth_lock); 842 843 if (prev_state == NFS_AUTH_FRESH) { 844 if (rw_tryupgrade(&exi->exi_cache_lock) == 0) {
If the previous state of the auth_cache entry was other than FRESH (for example STALE, or REFRESHING), we won't remove the entry from the exi_cache.
Related issues
Updated by Electric Monk about 6 years ago
- Status changed from Pending RTI to Closed
- % Done changed from 0 to 100
git commit 3d1d816f0f1e405a11a5871fb93eef11577c5fc8
commit 3d1d816f0f1e405a11a5871fb93eef11577c5fc8 Author: Marcel Telka <marcel.telka@nexenta.com> Date: 2014-12-16T16:11:15.000Z 5436 Panic due to bad mutex, from auth_cache being previously freed Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Dan McDonald <danmcd@omniti.com>