Project

General

Profile

Actions

Bug #14688

open

nvme blkdev attach/detach trips assertion in ndi_devi_unconfig_one()

Added by Hans Rosenfeld 15 days ago. Updated 11 days ago.

Status:
New
Priority:
Normal
Category:
driver - device drivers
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

There's a nasty issue with the way nvme attaches blkdev children that can cause an assertion to trip deep down in the devcfg code. This happens when nvme_attach() runs into an error after at least one blkdev has been attached.

During nvme_attach(), all namespaces are enumerated, and unless they have any properties that causes nvme to ignore them, they will also get a blkdev instance attached by means of bd_attach_handle(). This means that child nodes will be linked to the nvme device node despite the nvme device node itself not being in a fully attached state yet. This wasn't causing any problems so far as nothing in devcfg checks this, and it isn't a problem as long as nvme_attach() succeeds eventually.

If we run into a problem in nvme_attach() after we successfully attached at least one blkdev, we're in trouble. The cleanup code for nvme_attach() will try to detach the blkdev child, and that is when we hit the assertion: You apparently can't detach a child node when the parent node isn't in the attached state yet.

One solution would be to defer attaching of blkdevs to a taskq that is run after nvme is properly attached.

Actions #1

Updated by Hans Rosenfeld 15 days ago

  • Subject changed from blkdev attach/detach trips assertion in ndi_devi_unconfig_one() to nvme blkdev attach/detach trips assertion in ndi_devi_unconfig_one()
Actions #2

Updated by Electric Monk 15 days ago

  • Gerrit CR set to 2149
Actions #3

Updated by Hans Rosenfeld 11 days ago

Panic info:

[20]> ::msgbuf
...
Block device: blkdev@w344339304E402658,0, blkdev9
blkdev9 is /pci@85,0/pci8086,2030@0/pci11f8,8546@0/pci11f8,8546@16/pci144d,a80b@0/blkdev@w344339304E402658,0
/pci@85,0/pci8086,2030@0/pci11f8,8546@0/pci11f8,8546@16/pci144d,a80b@0/blkdev@w344339304E402658,0 (blkdev9) online
WARNING: nvme24: failed bringing node blkdev@w344339304E402658,0 online

panic[cpu20]/thread=fffffae7858b4c20: 
assertion failed: i_ddi_devi_attached(pdip), file: ../../common/os/devcfg.c, line: 6537

fffffae7858b46e0 fffffffffbddd221 ()
fffffae7858b4770 genunix:ndi_devi_unconfig_one+2e8 ()
fffffae7858b47d0 blkdev:bd_detach_handle+96 ()
fffffae7858b4830 nvme:nvme_detach+e6 ()
fffffae7858b4910 nvme:nvme_attach+73a ()
fffffae7858b4990 genunix:devi_attach+ad ()
fffffae7858b49d0 genunix:attach_node+149 ()
fffffae7858b4a20 genunix:i_ndi_config_node+d5 ()
fffffae7858b4a50 genunix:i_ddi_attachchild+60 ()
fffffae7858b4a90 genunix:devi_attach_node+8b ()
fffffae7858b4b10 genunix:config_immediate_children+e0 ()
fffffae7858b4b60 genunix:devi_config_common+69 ()
fffffae7858b4c00 genunix:mt_config_thread+216 ()
fffffae7858b4c10 unix:thread_start+b ()

[20]> $C
fffffffffbcaddd0 kmdb_enter+0xb()
fffffffffbcade00 debug_enter+0x75(fffffffffb97a080)
fffffffffbcadef0 panicsys+0x544(fffffffffbf34758, fffffae7858b4638, fffffffffbcadf00, 1)
fffffae7858b4620 vpanic+0x15c()
fffffae7858b4690 0xfffffffffb8c0c91()
fffffae7858b46e0 0xfffffffffbddd221()
fffffae7858b4770 ndi_devi_unconfig_one+0x2e8(fffffdf4350e4d10, fffffdf43b73d401, 0, 11)
fffffae7858b47d0 blkdev`bd_detach_handle+0x96(fffffdf43bac8028)
fffffae7858b4830 nvme`nvme_detach+0xe6(fffffdf4350e4d10, 0)
fffffae7858b4910 nvme`nvme_attach+0x73a(fffffdf4350e4d10, 0)
fffffae7858b4990 devi_attach+0xad(fffffdf4350e4d10, 0)
fffffae7858b49d0 attach_node+0x149(fffffdf4350e4d10)
fffffae7858b4a20 i_ndi_config_node+0xd5(fffffdf4350e4d10, 6, 0)
fffffae7858b4a50 i_ddi_attachchild+0x60(fffffdf4350e4d10)
fffffae7858b4a90 devi_attach_node+0x8b(fffffdf4350e4d10, 4004048)
fffffae7858b4b10 config_immediate_children+0xe0(fffffdf4350e6a30, 4004048, ffffffff)
fffffae7858b4b60 devi_config_common+0x69(fffffdf4350e6a30, 4004048, ffffffff)
fffffae7858b4c00 mt_config_thread+0x216(fffffdf43a320800)
fffffae7858b4c10 thread_start+0xb()

[20]> fffffdf4350e4d10::print -t struct dev_info devi_node_state
ddi_node_state_t devi_node_state = 4 (DS_PROBED)

The assertion is right at the top of ndi_devi_unconfig_one():

int
ndi_devi_unconfig_one(
    dev_info_t *pdip,
    char *devnm,
    dev_info_t **dipp,
    int flags)
{
    int        (*f)();
    int        circ, rv;
    int        pm_cookie;
    dev_info_t    *child;
    dev_info_t    *vdip = NULL;
    int        v_circ;
    struct brevq_node *brevq = NULL;

    ASSERT(i_ddi_devi_attached(pdip));

We get here because bd_attach_handle() for the 2nd namespace failed in nvme_attach(), so the nvme_attach() errors out and calls nvme_detach() to clean up. As this is for the 2nd namespace, the first one already attached successfully despite the parent node (nvme24) not being in an attached state:

[20]> fffffdf4350e4d10::prtconf                                 
DEVINFO          NAME                                              
fffffdf434f24d18 i86pc (driver name: rootnex)
    fffffdf435038458 pciex_root_complex, instance #5 (driver name: npe)
        fffffdf435038170 pciexclass,060400, instance #35 (driver name: pcieb)
            fffffdf43502b000 pciexclass,060400, instance #36 (driver name: pcieb)
                fffffdf4350e6a30 pciexclass,060400, instance #59 (driver name: pcieb)
                    fffffdf4350e4d10 pciexclass,010802, instance #24 (driver not attached)
                        fffffdf436f22d18 blkdev, instance #9 (driver name: blkdev)

Calling bd_detach_handle() on the attached blkdev as part of the cleanup will trip the assertion, as it's apparently not allowed to detach children from nodes that aren't attached yet. Other drivers using blkdev have the same problem, but as they only attach one blkdev and do so as pretty much the last step during their attach() it never caused any problems. That would be different if ndi_devi_alloc() or ndi_devi_online() checked the parent node state as well.

Actions

Also available in: Atom PDF