Project

General

Profile

Actions

Bug #4872

closed

system crash after nlm_gc hits bogus mutex

Added by Paul Dagnelie over 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
nfs - NFS server and client
Start date:
2014-05-20
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

While preparing to start an upgrade, we discovered that the DE was rebooting. 
Post reboot we found a crash dump.  That crash dump shows that nlm_gc
encountered a bogus mutex:

panic[cpu1]/thread=ffffff00b9695c40:
mutex_enter: bad mutex, lp=ffffff19c1813d28 owner=ffffff00b9695c40
thread=ffffff00b9695c40

ffffff00b9695b70 unix:mutex_panic+73 ()
ffffff00b9695bd0 unix:mutex_vector_enter+446 ()
ffffff00b9695c20 klmmod:nlm_gc+ab ()
ffffff00b9695c30 unix:thread_start+8 ()

syncing file systems...
 done
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel

> $c
vpanic()
mutex_panic+0x73(fffffffffb961b90, ffffff19c1813d28)
mutex_vector_enter+0x446(ffffff19c1813d28)
nlm_gc+0xab(ffffff19a746b8c0)
thread_start+8()

> ffffff19c1813d28::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
mdb: 0xffffff19c1813d28: invalid adaptive mutex (-f to dump anyway)

Diagnosis and fix described in detail at http://blog.delphix.com/pdagnelie/2014/05/19/nlms-garbage-collection-race/

Actions

Also available in: Atom PDF