Project

General

Profile

Actions

Bug #15311

open

ipd_detach() should be more careful about non-empty ipd_nsl

Added by Dan McDonald 3 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
networking
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

I received a kernel panic from a customer. It died here:

> $C
fffff946236048b0 vpanic()
fffff946236048d0 mutex_panic+0x4a(fffffffffb9633fb, ffffffffc0001db0)
fffff94623604940 mutex_vector_enter+0x307(ffffffffc0001db0)
fffff94623604970 ipd_nin_destroy+0x1a(11bd, fffffef69db8f6c0)
fffff946236049b0 neti_stack_apply_destroy+0x108(fffffd6b2fbbff68, fffffd6a660d78c8)
fffff946236049f0 neti_apply_all_instances+0x2b(fffffd6b2fbbff68, fffffffff7ab8620)
fffff94623604a20 neti_stack_fini+0x95(11bd, fffffd6b2fbbff68)
fffff94623604aa0 netstack_apply_destroy+0x104(fffffffffbccfb88, fffffd6b2f834c80, 4)
fffff94623604ae0 apply_all_modules_reverse+0x3f(fffffd6b2f834c80, fffffffffba315d0)
fffff94623604b20 netstack_stack_inactive+0x106(fffffd6b2f834c80)
fffff94623604b60 netstack_reap+0x1a(fffffd6b2f834c80)
fffff94623604c00 taskq_d_thread+0xbc(ffffff5e8a85c7b0)
fffff94623604c10 thread_start+0xb()
> 

A quick inspection of the mutex in question, ipd_nsl_lock, shows it has been mutex_destroy()'ed:

> ffffffffc0001db0::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
mdb: ipd_nsl_lock: invalid adaptive mutex (-f to dump anyway)
> ffffffffc0001db0::mutex -f
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
ffffffffc0001db0 adapt fffff94625e01c20      -      -      no
> ffffffffc0001db0/P        
ipd_nsl_lock:
ipd_nsl_lock:   0xfffff94625e01c26 
> 

The 0x6 at the end is an indicator of a destroyed mutex, and masking that 0x6 out shows the thread that destroyed the mutex:

> fffff94625e01c20::findstack -v
stack pointer for thread fffff94625e01c20 (TS_FREE) (mt_config_thread()): fffff94625e01bb0
[ fffff94625e01bb0 resume_from_zombie+0x95() ]
  fffff94625e01be0 swtch_from_zombie+0x85()
  fffff94625e01c00 thread_exit+0xeb()
  fffff94625e01c10 thread_splitstack_run()
> 

It appears to be a freed thread. Lucky for us, the ipd_detach() function is the ONLY function that can mutex_destroy(&ipd_nsl_lock). Let's take a look...

ipd_detach(dev_info_t *dip, ddi_detach_cmd_t cmd)
{        
        if (cmd != DDI_DETACH)
                return (DDI_FAILURE);

        mutex_enter(&ipd_nactive_lock);
        if (ipd_nactive > 0) {
                mutex_exit(&ipd_nactive_lock);
                return (EBUSY);
        }
        mutex_exit(&ipd_nactive_lock);
        ASSERT(dip == ipd_devi);
        ddi_remove_minor_node(dip, NULL);
        ipd_devi = NULL;

        if (ipd_neti != NULL) {
                VERIFY(net_instance_unregister(ipd_neti) == 0);
                net_instance_free(ipd_neti);
        }

        mutex_destroy(&ipd_nsl_lock);
        mutex_destroy(&ipd_nactive_lock);
        list_destroy(&ipd_nsl);  /* XXX KEBE SAYS WHOA! */

Hmmm, the dump I have seems to indicate that ipd_detach() can race against a netstack memory reclamation thread and beat it to destroy ipd_nsl_lock AND destroying the list head.

More MDB inspection shows the disconnect between the ipd netstack instance (in the reclamation thread) and the list head:

> ipd_nsl::print list_t
{
    list_size = 0x110
    list_offset = 0
    list_head = {
        list_next = 0
        list_prev = 0
    }
}
> fffffef69db8f6c0::print ipd_netstack_t
{
    ipdn_link = {
        list_next = ipd_nsl+0x10
        list_prev = ipd_nsl+0x10
    }

...

At first glance, I think acquiring the lock at ipd_detach() and checking for a non-empty list (returning EBUSY in that case) would be the first correct thing to do.

Actions

Also available in: Atom PDF