Project

General

Profile

Bug #12924

blkdev needs to be better at handling attach failures

Added by Jason King about 1 month ago. Updated 30 days ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
driver - device drivers
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

I stumbled on this (somewhat indirectly) while testing #12506.

From bd_attach, we have the following key sequence of events:

    if (bd->d_ksp != NULL) {
        bd->d_ksp->ks_lock = &bd->d_ksmutex;
        kstat_install(bd->d_ksp);
        bd->d_kiop = bd->d_ksp->ks_data;
    } else {
        /*
         * Even if we cannot create the kstat, we create a
         * scratch kstat.  The reason for this is to ensure
         * that we can update the kstat all of the time,
         * without adding an extra branch instruction.
         */
        bd->d_kiop = kmem_zalloc(sizeof (kstat_io_t), KM_SLEEP);
    }

...
    rv = cmlb_attach(dip, &bd_tg_ops, DTYPE_DIRECT,
        bd->d_removable, bd->d_hotpluggable,
        /*LINTED: E_BAD_PTR_CAST_ALIGN*/
        *(uint64_t *)drive.d_eui64 != 0 ? DDI_NT_BLOCK_BLKDEV :
        drive.d_lun >= 0 ? DDI_NT_BLOCK_CHAN : DDI_NT_BLOCK,
        CMLB_FAKE_LABEL_ONE_PARTITION, bd->d_cmlbh, 0);
    if (rv != 0) {
        cmlb_free_handle(&bd->d_cmlbh);
        kmem_cache_destroy(bd->d_cache);
        mutex_destroy(&bd->d_ksmutex);
        mutex_destroy(&bd->d_ocmutex);
        mutex_destroy(&bd->d_statemutex);
        cv_destroy(&bd->d_statecv);
        bd_queues_free(bd);
        if (bd->d_ksp != NULL) {
            kstat_delete(bd->d_ksp);
            bd->d_ksp = NULL;
        } else {
            kmem_free(bd->d_kiop, sizeof (kstat_io_t));
        }
        ddi_soft_state_free(bd_state, inst);
        return (DDI_FAILURE);

If the cmlb_attach fails, the mutex (which is also used by bd->d_ksp) is destroyed before the kstat. This can lead to the following panic:

panic[cpu6]/thread=fffffe16e56cfc00: kstat_delete(fffffe16e6313000): caller holds data lock fffffe16e561a598

The fix is to delete the kstat prior to deleting d_ksmutex -- or just have the cleanup do things in reverse of the setup.

Additionally, if bd_attach fails, the following different panic can also occur:

WARNING: vioblk0: failed bringing node blkdev@0,0 online
WARNING: vioblk0: Failed to attach blkdev

panic[cpu0]/thread=fffffffffbc57060: BAD TRAP: type=d (#gp General protection) rp=fffffffffbc965d0 addr=fffffe16fed28218

#gp General protection
addr=0xfffffe16fed28218
pid=0, pc=0xfffffffffbd1030c, sp=0xfffffffffbc966c0, eflags=0x10282
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe>  cr4: 606b8<osxsav,pcide,xmme,fxsr,pge,pae,pse,de>
cr2: 0  cr3: 7400000  cr8: c

        rdi: deadbeefdeadbeef rsi: fffffffffbc96704 rdx: fffffffffbc57060
        rcx:                0  r8: baddcafebaddcafe  r9:                0
        rax: deadbeefdeadbeef rbx: deadbeefdeadbeef rbp: fffffffffbc966f0
        r10: baddcafebaddcafe r11:                1 r12: deadbeefdeadbeef
        r13: fffffffffbc96704 r14: fffffe16fed28230 r15: fffffe16fed28218
        fsb:        200000000 gsb: fffffffffbc59000  ds:                0
         es:                0  fs:                0  gs:                0
        trp:                d err:                0 rip: fffffffffbd1030c
         cs:               30 rfl:            10282 rsp: fffffffffbc966c0
         ss:               38

CPU          ADDRESS    TIMESTAMP TYPE  VC HANDLER          PC
  0 fffffffffbc38b30   379411506d trap   d #gp              ndi_devi_enter+1c
  0 fffffffffbc389b8   37941115de intr   - (fakesoftint)    fakesoftint+23
  0 fffffffffbc38840   3793fbf40c intr  d1 cbe_fire         outb+7
  0 fffffffffbc386c8   3793f26e80 intr   - (fakesoftint)    fakesoftint+23
  0 fffffffffbc38550   3793dbf114 intr  d1 cbe_fire         inb+6
  0 fffffffffbc383d8   3793bf42fa intr  d1 cbe_fire         cmlb_attach+5b
  0 fffffffffbc38260   21ed81d882 intr  21 vioblk_int_handl ddi_io_put16+10
  0 fffffffffbc380e8   21ed6c5626 intr  d1 cbe_fire         vioblk_alloc_reqs+e8
  0 fffffffffbc37f70   21ed4c54de intr  d1 cbe_fire         copy_pattern+1c
  0 fffffffffbc37df8   21ed2c56a8 intr  d1 cbe_fire         copy_pattern+1c

Warning - stack not written to the dump buffer
fffffffffbc964d0 unix:rewrite_syscall+0 ()
fffffffffbc965c0 unix:trap+d6a ()
fffffffffbc965d0 unix:cmntrap+1cd ()
fffffffffbc966f0 genunix:ndi_devi_enter+1c ()
fffffffffbc96740 genunix:ddi_remove_child+49 ()
fffffffffbc967a0 blkdev:bd_detach_handle+f3 ()
fffffffffbc96850 vioblk:vioblk_attach+50e ()
fffffffffbc968d0 genunix:devi_attach+ad ()
fffffffffbc96910 genunix:attach_node+149 ()
fffffffffbc96960 genunix:i_ndi_config_node+16c ()
fffffffffbc96990 genunix:i_ddi_attachchild+60 ()
fffffffffbc969d0 genunix:devi_attach_node+8b ()
fffffffffbc96aa0 genunix:devi_config_one+3ae ()
fffffffffbc96b20 genunix:ndi_devi_config_one+d3 ()
fffffffffbc96c00 genunix:resolve_pathname+1f1 ()
fffffffffbc96c40 genunix:e_ddi_hold_devi_by_path+30 ()
fffffffffbc96d10 genunix:e_devid_cache_to_devt_list+363 ()
fffffffffbc96d70 genunix:ddi_lyr_devid_to_devlist+56 ()
fffffffffbc96e10 genunix:ldi_vp_from_devid+60 ()
fffffffffbc96ea0 genunix:ldi_open_by_devid+84 ()
fffffffffbc96f50 zfs:vdev_disk_read_rootlabel+1d3 ()
fffffffffbc96fc0 zfs:spa_generate_rootconf+2b ()
fffffffffbc97050 zfs:spa_import_rootpool+3d ()
fffffffffbc970b0 zfs:zfs_mountroot+d5 ()
fffffffffbc970d0 genunix:fsop_mountroot+13 ()
fffffffffbc97110 genunix:rootconf+170 ()
fffffffffbc97150 genunix:vfs_mountroot+6e ()
fffffffffbc97190 genunix:main+197 ()
fffffffffbc971a0 unix:_locore_start+90 ()

The issue here is:

    ddi_set_parent_data(child, hdl);
    hdl->h_child = child;

    if (ndi_devi_online(child, 0) == NDI_FAILURE) {
        cmn_err(CE_WARN, "%s%d: failed bringing node %s@%s online",
            ddi_driver_name(dip), ddi_get_instance(dip),
            hdl->h_name, hdl->h_addr);
        (void) ndi_devi_free(child);
        return (DDI_FAILURE);
    }

if ndi_devi_online fails, it frees child while still referencing it in h_child. A driver that uses blkdev (e.g. vioblk) may then call bd_detach_handle which will then crash trying to using h_child.

(ht to rm for spotting


Related issues

Related to illumos gate - Feature #12506: Add support to vioblk for DISCARD operationClosed

Actions
Related to illumos gate - Bug #12925: Recursive mutex enter panic in virtio_fini()New

Actions

History

#1

Updated by Jason King about 1 month ago

This can mostly easily be recreated by booting into kmdb, and set a breakpoint in cmlb_attach:
::bp cmlb`cmlb_attach+0x47 -- this is far enough into cmlb_attach that the stack frame is setup, and ::stack will be able to see the address of the second argument e.g.

[6]> ::stack
cmlb`cmlb_attach+0x47(fffffe16e239b000, fffffffffbf1f440, 0, 0, 1, fffffffff7adb89d)
blkdev`bd_attach+0x4de(fffffe16e239b000, 0)
devi_attach+0xa1(fffffe16e239b000, 0)
attach_node+0x8b(fffffe16e239b000)

Then set the value of tg_version to zero to force a failure with EINVAL:

[0]> fffffffffbf1f440::print -ta cmlb_tg_ops_t
fffffffffbf1f440 cmlb_tg_ops_t {
    fffffffffbf1f440 int tg_version = 0x1
    fffffffffbf1f448 int (*)() tg_rdwr = blkdev`bd_tg_rdwr
    fffffffffbf1f450 int (*)() tg_getinfo = blkdev`bd_tg_getinfo
}
[0]> fffffffffbf1f440/W 0
blkdev`bd_tg_ops:               0x1             =       0x0
[0]> :c

(you may need to continue a second time)

Shortly thereafter, you should panic.

#2

Updated by Jason King about 1 month ago

  • Description updated (diff)
#3

Updated by Jason King about 1 month ago

  • Subject changed from cmlb_attach failure in blkdev can cause a panic to blkdev needs to be better at handling attach failures
#4

Updated by Electric Monk about 1 month ago

  • Gerrit CR set to 774
#5

Updated by Jason King about 1 month ago

For testing, I tried to reproduce the error similar to before:

Booting...
Loading kmdb...

Welcome to kmdb
kmdb: unable to determine terminal type: assuming `vt100'
Loaded modules: [ unix krtld genunix ]
[0]> ::bp cmlb`cmlb_attach
[0]> ::bp blkdev`bd_destroy_errstats
[0]> :c
illumos Version gate-blkdev-dbcc66bd02 64-bit
DEBUG enabled
Loaded modules: [ scsi_vhci mac uppc zfs apix specfs pcplusmp cpu.generic ]
kmdb: stop at cmlb`cmlb_attach
kmdb: target stopped at:
cmlb`cmlb_attach:       pushq  %rbp
[0]> ::regs
%rax = 0xfffffe1705c65000                 %r9  = 0xfffffffff7af096c
%rbx = 0xfffffe16fe99a658                 %r10 = 0xfffffffffb882a48 bcopy_altentry+0x1b5
%rcx = 0x0000000000000000                 %r11 = 0x0000000000000000
%rdx = 0x0000000000000000                 %r12 = 0x0000000000000000
%rsi = 0xffffffffc0095960 blkdev`bd_tg_ops %r13 = 0xfffffe170235b3c8
%rdi = 0xfffffe16fe99a658                 %r14 = 0x0000000000000001
%r8  = 0x0000000000000001                 %r15 = 0xfffffe16fe9ab340

%rip = 0xfffffffff7af1570 cmlb`cmlb_attach
%rbp = 0xfffffffffbc964b0
%rsp = 0xfffffffffbc96368
%rflags = 0x00000246
  id=0 vip=0 vif=0 ac=0 vm=0 rf=0 nt=0 iopl=0x0
  status=<of,df,IF,tf,sf,ZF,af,PF,cf>

%cs = 0x0030    %ds = 0x0000    %es = 0x0000    %fs = 0x0000
%gs = 0x0000    %gsbase = 0xfffffffffbc59000    %kgsbase = 0x200000000
%trapno = 0x3   %err = 0x0      %cr2 = 0x0      %cr3 = 0x7400000

[0]> 0xffffffffc0095960/W 0
blkdev`bd_tg_ops:               0x1             =       0x0
[0]> 0xffffffffc0095960::print cmlb_tg_ops_t
{
    tg_version = 0
    tg_rdwr = blkdev`bd_tg_rdwr
    tg_getinfo = blkdev`bd_tg_getinfo
}
[0]> :c
kmdb: stop at blkdev`bd_destroy_errstats
kmdb: target stopped at:
blkdev`bd_destroy_errstats:     pushq  %rbp
[0]> :c
WARNING: vioblk0: failed bringing node blkdev@0,0 online
WARNING: vioblk0: Failed to attach blkdev

panic[cpu0]/thread=fffffffffbc57060: recursive mutex_enter, lp=fffffe17016fba30 owner=fffffffffbc57060 thread=fffffffffbc57060

Warning - stack not written to the dump buffer
fffffffffbc966b0 unix:mutex_panic+4a ()
fffffffffbc96720 unix:mutex_vector_enter+317 ()

With the change, we now successfully cleanup, and deallocate the kstats, and now hit #12925 (since this is a vioblk device).

#6

Updated by Dan McDonald 30 days ago

  • Related to Feature #12506: Add support to vioblk for DISCARD operation added
#7

Updated by Dan McDonald 30 days ago

  • Related to Bug #12925: Recursive mutex enter panic in virtio_fini() added
#8

Updated by Electric Monk 30 days ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 33f84ecfada5880d94e9bfc5af7954d41e5664d5

commit  33f84ecfada5880d94e9bfc5af7954d41e5664d5
Author: Jason King <jason.king@joyent.com>
Date:   2020-07-08T14:21:42.000Z

    12924 blkdev needs to be better at handling attach failures
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Paul Winder <paul@winders.demon.co.uk>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF