Bug #12924
closedblkdev needs to be better at handling attach failures
100%
Description
I stumbled on this (somewhat indirectly) while testing #12506.
From bd_attach
, we have the following key sequence of events:
if (bd->d_ksp != NULL) { bd->d_ksp->ks_lock = &bd->d_ksmutex; kstat_install(bd->d_ksp); bd->d_kiop = bd->d_ksp->ks_data; } else { /* * Even if we cannot create the kstat, we create a * scratch kstat. The reason for this is to ensure * that we can update the kstat all of the time, * without adding an extra branch instruction. */ bd->d_kiop = kmem_zalloc(sizeof (kstat_io_t), KM_SLEEP); } ... rv = cmlb_attach(dip, &bd_tg_ops, DTYPE_DIRECT, bd->d_removable, bd->d_hotpluggable, /*LINTED: E_BAD_PTR_CAST_ALIGN*/ *(uint64_t *)drive.d_eui64 != 0 ? DDI_NT_BLOCK_BLKDEV : drive.d_lun >= 0 ? DDI_NT_BLOCK_CHAN : DDI_NT_BLOCK, CMLB_FAKE_LABEL_ONE_PARTITION, bd->d_cmlbh, 0); if (rv != 0) { cmlb_free_handle(&bd->d_cmlbh); kmem_cache_destroy(bd->d_cache); mutex_destroy(&bd->d_ksmutex); mutex_destroy(&bd->d_ocmutex); mutex_destroy(&bd->d_statemutex); cv_destroy(&bd->d_statecv); bd_queues_free(bd); if (bd->d_ksp != NULL) { kstat_delete(bd->d_ksp); bd->d_ksp = NULL; } else { kmem_free(bd->d_kiop, sizeof (kstat_io_t)); } ddi_soft_state_free(bd_state, inst); return (DDI_FAILURE);
If the cmlb_attach
fails, the mutex (which is also used by bd->d_ksp) is destroyed before the kstat. This can lead to the following panic:
panic[cpu6]/thread=fffffe16e56cfc00: kstat_delete(fffffe16e6313000): caller holds data lock fffffe16e561a598
The fix is to delete the kstat prior to deleting d_ksmutex
-- or just have the cleanup do things in reverse of the setup.
Additionally, if bd_attach
fails, the following different panic can also occur:
WARNING: vioblk0: failed bringing node blkdev@0,0 online WARNING: vioblk0: Failed to attach blkdev panic[cpu0]/thread=fffffffffbc57060: BAD TRAP: type=d (#gp General protection) rp=fffffffffbc965d0 addr=fffffe16fed28218 #gp General protection addr=0xfffffe16fed28218 pid=0, pc=0xfffffffffbd1030c, sp=0xfffffffffbc966c0, eflags=0x10282 cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 606b8<osxsav,pcide,xmme,fxsr,pge,pae,pse,de> cr2: 0 cr3: 7400000 cr8: c rdi: deadbeefdeadbeef rsi: fffffffffbc96704 rdx: fffffffffbc57060 rcx: 0 r8: baddcafebaddcafe r9: 0 rax: deadbeefdeadbeef rbx: deadbeefdeadbeef rbp: fffffffffbc966f0 r10: baddcafebaddcafe r11: 1 r12: deadbeefdeadbeef r13: fffffffffbc96704 r14: fffffe16fed28230 r15: fffffe16fed28218 fsb: 200000000 gsb: fffffffffbc59000 ds: 0 es: 0 fs: 0 gs: 0 trp: d err: 0 rip: fffffffffbd1030c cs: 30 rfl: 10282 rsp: fffffffffbc966c0 ss: 38 CPU ADDRESS TIMESTAMP TYPE VC HANDLER PC 0 fffffffffbc38b30 379411506d trap d #gp ndi_devi_enter+1c 0 fffffffffbc389b8 37941115de intr - (fakesoftint) fakesoftint+23 0 fffffffffbc38840 3793fbf40c intr d1 cbe_fire outb+7 0 fffffffffbc386c8 3793f26e80 intr - (fakesoftint) fakesoftint+23 0 fffffffffbc38550 3793dbf114 intr d1 cbe_fire inb+6 0 fffffffffbc383d8 3793bf42fa intr d1 cbe_fire cmlb_attach+5b 0 fffffffffbc38260 21ed81d882 intr 21 vioblk_int_handl ddi_io_put16+10 0 fffffffffbc380e8 21ed6c5626 intr d1 cbe_fire vioblk_alloc_reqs+e8 0 fffffffffbc37f70 21ed4c54de intr d1 cbe_fire copy_pattern+1c 0 fffffffffbc37df8 21ed2c56a8 intr d1 cbe_fire copy_pattern+1c Warning - stack not written to the dump buffer fffffffffbc964d0 unix:rewrite_syscall+0 () fffffffffbc965c0 unix:trap+d6a () fffffffffbc965d0 unix:cmntrap+1cd () fffffffffbc966f0 genunix:ndi_devi_enter+1c () fffffffffbc96740 genunix:ddi_remove_child+49 () fffffffffbc967a0 blkdev:bd_detach_handle+f3 () fffffffffbc96850 vioblk:vioblk_attach+50e () fffffffffbc968d0 genunix:devi_attach+ad () fffffffffbc96910 genunix:attach_node+149 () fffffffffbc96960 genunix:i_ndi_config_node+16c () fffffffffbc96990 genunix:i_ddi_attachchild+60 () fffffffffbc969d0 genunix:devi_attach_node+8b () fffffffffbc96aa0 genunix:devi_config_one+3ae () fffffffffbc96b20 genunix:ndi_devi_config_one+d3 () fffffffffbc96c00 genunix:resolve_pathname+1f1 () fffffffffbc96c40 genunix:e_ddi_hold_devi_by_path+30 () fffffffffbc96d10 genunix:e_devid_cache_to_devt_list+363 () fffffffffbc96d70 genunix:ddi_lyr_devid_to_devlist+56 () fffffffffbc96e10 genunix:ldi_vp_from_devid+60 () fffffffffbc96ea0 genunix:ldi_open_by_devid+84 () fffffffffbc96f50 zfs:vdev_disk_read_rootlabel+1d3 () fffffffffbc96fc0 zfs:spa_generate_rootconf+2b () fffffffffbc97050 zfs:spa_import_rootpool+3d () fffffffffbc970b0 zfs:zfs_mountroot+d5 () fffffffffbc970d0 genunix:fsop_mountroot+13 () fffffffffbc97110 genunix:rootconf+170 () fffffffffbc97150 genunix:vfs_mountroot+6e () fffffffffbc97190 genunix:main+197 () fffffffffbc971a0 unix:_locore_start+90 ()
The issue here is:
ddi_set_parent_data(child, hdl); hdl->h_child = child; if (ndi_devi_online(child, 0) == NDI_FAILURE) { cmn_err(CE_WARN, "%s%d: failed bringing node %s@%s online", ddi_driver_name(dip), ddi_get_instance(dip), hdl->h_name, hdl->h_addr); (void) ndi_devi_free(child); return (DDI_FAILURE); }
if ndi_devi_online
fails, it frees child
while still referencing it in h_child
. A driver that uses blkdev (e.g. vioblk) may then call bd_detach_handle
which will then crash trying to using h_child
.
(ht to rm for spotting
Related issues
Updated by Jason King about 3 years ago
This can mostly easily be recreated by booting into kmdb, and set a breakpoint in cmlb_attach:::bp cmlb`cmlb_attach+0x47
-- this is far enough into cmlb_attach
that the stack frame is setup, and ::stack
will be able to see the address of the second argument e.g.
[6]> ::stack cmlb`cmlb_attach+0x47(fffffe16e239b000, fffffffffbf1f440, 0, 0, 1, fffffffff7adb89d) blkdev`bd_attach+0x4de(fffffe16e239b000, 0) devi_attach+0xa1(fffffe16e239b000, 0) attach_node+0x8b(fffffe16e239b000)
Then set the value of tg_version
to zero to force a failure with EINVAL
:
[0]> fffffffffbf1f440::print -ta cmlb_tg_ops_t fffffffffbf1f440 cmlb_tg_ops_t { fffffffffbf1f440 int tg_version = 0x1 fffffffffbf1f448 int (*)() tg_rdwr = blkdev`bd_tg_rdwr fffffffffbf1f450 int (*)() tg_getinfo = blkdev`bd_tg_getinfo } [0]> fffffffffbf1f440/W 0 blkdev`bd_tg_ops: 0x1 = 0x0 [0]> :c
(you may need to continue a second time)
Shortly thereafter, you should panic.
Updated by Jason King about 3 years ago
- Subject changed from cmlb_attach failure in blkdev can cause a panic to blkdev needs to be better at handling attach failures
Updated by Jason King about 3 years ago
For testing, I tried to reproduce the error similar to before:
Booting... Loading kmdb... Welcome to kmdb kmdb: unable to determine terminal type: assuming `vt100' Loaded modules: [ unix krtld genunix ] [0]> ::bp cmlb`cmlb_attach [0]> ::bp blkdev`bd_destroy_errstats [0]> :c illumos Version gate-blkdev-dbcc66bd02 64-bit DEBUG enabled Loaded modules: [ scsi_vhci mac uppc zfs apix specfs pcplusmp cpu.generic ] kmdb: stop at cmlb`cmlb_attach kmdb: target stopped at: cmlb`cmlb_attach: pushq %rbp [0]> ::regs %rax = 0xfffffe1705c65000 %r9 = 0xfffffffff7af096c %rbx = 0xfffffe16fe99a658 %r10 = 0xfffffffffb882a48 bcopy_altentry+0x1b5 %rcx = 0x0000000000000000 %r11 = 0x0000000000000000 %rdx = 0x0000000000000000 %r12 = 0x0000000000000000 %rsi = 0xffffffffc0095960 blkdev`bd_tg_ops %r13 = 0xfffffe170235b3c8 %rdi = 0xfffffe16fe99a658 %r14 = 0x0000000000000001 %r8 = 0x0000000000000001 %r15 = 0xfffffe16fe9ab340 %rip = 0xfffffffff7af1570 cmlb`cmlb_attach %rbp = 0xfffffffffbc964b0 %rsp = 0xfffffffffbc96368 %rflags = 0x00000246 id=0 vip=0 vif=0 ac=0 vm=0 rf=0 nt=0 iopl=0x0 status=<of,df,IF,tf,sf,ZF,af,PF,cf> %cs = 0x0030 %ds = 0x0000 %es = 0x0000 %fs = 0x0000 %gs = 0x0000 %gsbase = 0xfffffffffbc59000 %kgsbase = 0x200000000 %trapno = 0x3 %err = 0x0 %cr2 = 0x0 %cr3 = 0x7400000 [0]> 0xffffffffc0095960/W 0 blkdev`bd_tg_ops: 0x1 = 0x0 [0]> 0xffffffffc0095960::print cmlb_tg_ops_t { tg_version = 0 tg_rdwr = blkdev`bd_tg_rdwr tg_getinfo = blkdev`bd_tg_getinfo } [0]> :c kmdb: stop at blkdev`bd_destroy_errstats kmdb: target stopped at: blkdev`bd_destroy_errstats: pushq %rbp [0]> :c WARNING: vioblk0: failed bringing node blkdev@0,0 online WARNING: vioblk0: Failed to attach blkdev panic[cpu0]/thread=fffffffffbc57060: recursive mutex_enter, lp=fffffe17016fba30 owner=fffffffffbc57060 thread=fffffffffbc57060 Warning - stack not written to the dump buffer fffffffffbc966b0 unix:mutex_panic+4a () fffffffffbc96720 unix:mutex_vector_enter+317 ()
With the change, we now successfully cleanup, and deallocate the kstats, and now hit #12925 (since this is a vioblk device).
Updated by Dan McDonald about 3 years ago
- Related to Feature #12506: Add support to vioblk for DISCARD operation added
Updated by Dan McDonald about 3 years ago
- Related to Bug #12925: Recursive mutex enter panic in virtio_fini() added
Updated by Electric Monk about 3 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit 33f84ecfada5880d94e9bfc5af7954d41e5664d5
commit 33f84ecfada5880d94e9bfc5af7954d41e5664d5 Author: Jason King <jason.king@joyent.com> Date: 2020-07-08T14:21:42.000Z 12924 blkdev needs to be better at handling attach failures Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Paul Winder <paul@winders.demon.co.uk> Approved by: Dan McDonald <danmcd@joyent.com>