Bug #12924
Updated by Jason King almost 2 years ago
I stumbled on this (somewhat indirectly) while testing #12506.
From @bd_attach@, we have the following key sequence of events:
<pre>
if (bd->d_ksp != NULL) {
bd->d_ksp->ks_lock = &bd->d_ksmutex;
kstat_install(bd->d_ksp);
bd->d_kiop = bd->d_ksp->ks_data;
} else {
/*
* Even if we cannot create the kstat, we create a
* scratch kstat. The reason for this is to ensure
* that we can update the kstat all of the time,
* without adding an extra branch instruction.
*/
bd->d_kiop = kmem_zalloc(sizeof (kstat_io_t), KM_SLEEP);
}
...
rv = cmlb_attach(dip, &bd_tg_ops, DTYPE_DIRECT,
bd->d_removable, bd->d_hotpluggable,
/*LINTED: E_BAD_PTR_CAST_ALIGN*/
*(uint64_t *)drive.d_eui64 != 0 ? DDI_NT_BLOCK_BLKDEV :
drive.d_lun >= 0 ? DDI_NT_BLOCK_CHAN : DDI_NT_BLOCK,
CMLB_FAKE_LABEL_ONE_PARTITION, bd->d_cmlbh, 0);
if (rv != 0) {
cmlb_free_handle(&bd->d_cmlbh);
kmem_cache_destroy(bd->d_cache);
mutex_destroy(&bd->d_ksmutex);
mutex_destroy(&bd->d_ocmutex);
mutex_destroy(&bd->d_statemutex);
cv_destroy(&bd->d_statecv);
bd_queues_free(bd);
if (bd->d_ksp != NULL) {
kstat_delete(bd->d_ksp);
bd->d_ksp = NULL;
} else {
kmem_free(bd->d_kiop, sizeof (kstat_io_t));
}
ddi_soft_state_free(bd_state, inst);
return (DDI_FAILURE);
</pre>
If the @cmlb_attach@ fails, the mutex (which is also used by bd->d_ksp) is destroyed before the kstat. This can lead to the following panic:
<pre>
panic[cpu6]/thread=fffffe16e56cfc00: kstat_delete(fffffe16e6313000): caller holds data lock fffffe16e561a598
</pre>
The fix is to delete the kstat prior to deleting @d_ksmutex@ -- or just have the cleanup do things in reverse of the setup.
Additionally, if @bd_attach@ fails, the following different panic can also occur:
<pre>
WARNING: vioblk0: failed bringing node blkdev@0,0 online
WARNING: vioblk0: Failed to attach blkdev
panic[cpu0]/thread=fffffffffbc57060: BAD TRAP: type=d (#gp General protection) rp=fffffffffbc965d0 addr=fffffe16fed28218
#gp General protection
addr=0xfffffe16fed28218
pid=0, pc=0xfffffffffbd1030c, sp=0xfffffffffbc966c0, eflags=0x10282
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 606b8<osxsav,pcide,xmme,fxsr,pge,pae,pse,de>
cr2: 0 cr3: 7400000 cr8: c
rdi: deadbeefdeadbeef rsi: fffffffffbc96704 rdx: fffffffffbc57060
rcx: 0 r8: baddcafebaddcafe r9: 0
rax: deadbeefdeadbeef rbx: deadbeefdeadbeef rbp: fffffffffbc966f0
r10: baddcafebaddcafe r11: 1 r12: deadbeefdeadbeef
r13: fffffffffbc96704 r14: fffffe16fed28230 r15: fffffe16fed28218
fsb: 200000000 gsb: fffffffffbc59000 ds: 0
es: 0 fs: 0 gs: 0
trp: d err: 0 rip: fffffffffbd1030c
cs: 30 rfl: 10282 rsp: fffffffffbc966c0
ss: 38
CPU ADDRESS TIMESTAMP TYPE VC HANDLER PC
0 fffffffffbc38b30 379411506d trap d #gp ndi_devi_enter+1c
0 fffffffffbc389b8 37941115de intr - (fakesoftint) fakesoftint+23
0 fffffffffbc38840 3793fbf40c intr d1 cbe_fire outb+7
0 fffffffffbc386c8 3793f26e80 intr - (fakesoftint) fakesoftint+23
0 fffffffffbc38550 3793dbf114 intr d1 cbe_fire inb+6
0 fffffffffbc383d8 3793bf42fa intr d1 cbe_fire cmlb_attach+5b
0 fffffffffbc38260 21ed81d882 intr 21 vioblk_int_handl ddi_io_put16+10
0 fffffffffbc380e8 21ed6c5626 intr d1 cbe_fire vioblk_alloc_reqs+e8
0 fffffffffbc37f70 21ed4c54de intr d1 cbe_fire copy_pattern+1c
0 fffffffffbc37df8 21ed2c56a8 intr d1 cbe_fire copy_pattern+1c
Warning - stack not written to the dump buffer
fffffffffbc964d0 unix:rewrite_syscall+0 ()
fffffffffbc965c0 unix:trap+d6a ()
fffffffffbc965d0 unix:cmntrap+1cd ()
fffffffffbc966f0 genunix:ndi_devi_enter+1c ()
fffffffffbc96740 genunix:ddi_remove_child+49 ()
fffffffffbc967a0 blkdev:bd_detach_handle+f3 ()
fffffffffbc96850 vioblk:vioblk_attach+50e ()
fffffffffbc968d0 genunix:devi_attach+ad ()
fffffffffbc96910 genunix:attach_node+149 ()
fffffffffbc96960 genunix:i_ndi_config_node+16c ()
fffffffffbc96990 genunix:i_ddi_attachchild+60 ()
fffffffffbc969d0 genunix:devi_attach_node+8b ()
fffffffffbc96aa0 genunix:devi_config_one+3ae ()
fffffffffbc96b20 genunix:ndi_devi_config_one+d3 ()
fffffffffbc96c00 genunix:resolve_pathname+1f1 ()
fffffffffbc96c40 genunix:e_ddi_hold_devi_by_path+30 ()
fffffffffbc96d10 genunix:e_devid_cache_to_devt_list+363 ()
fffffffffbc96d70 genunix:ddi_lyr_devid_to_devlist+56 ()
fffffffffbc96e10 genunix:ldi_vp_from_devid+60 ()
fffffffffbc96ea0 genunix:ldi_open_by_devid+84 ()
fffffffffbc96f50 zfs:vdev_disk_read_rootlabel+1d3 ()
fffffffffbc96fc0 zfs:spa_generate_rootconf+2b ()
fffffffffbc97050 zfs:spa_import_rootpool+3d ()
fffffffffbc970b0 zfs:zfs_mountroot+d5 ()
fffffffffbc970d0 genunix:fsop_mountroot+13 ()
fffffffffbc97110 genunix:rootconf+170 ()
fffffffffbc97150 genunix:vfs_mountroot+6e ()
fffffffffbc97190 genunix:main+197 ()
fffffffffbc971a0 unix:_locore_start+90 ()
</pre>
The issue here is:
<pre>
ddi_set_parent_data(child, hdl);
hdl->h_child = child;
if (ndi_devi_online(child, 0) == NDI_FAILURE) {
cmn_err(CE_WARN, "%s%d: failed bringing node %s@%s online",
ddi_driver_name(dip), ddi_get_instance(dip),
hdl->h_name, hdl->h_addr);
(void) ndi_devi_free(child);
return (DDI_FAILURE);
}
</pre>
if @ndi_devi_online@ fails, it frees @child@ while still referencing it in @h_child@. A driver that uses blkdev (e.g. vioblk) may then call @bd_detach_handle@ which will then crash trying to using @h_child@.
(ht to rm for spotting