Bug #12981
closedvdev_hold prone to NULL dereference
100%
Description
I went to scrub my root pool which had been subject to testing as part of #12894. (It was a root pool that had undergone SLOG addition and removal.) Immediately upon starting the scrub, the machine panicked in ZFS:
zpool:
#pf Page fault
Bad kernel fault at addr=0x0
pid=499, pc=0x0, sp=0xfffffe007aedea68, eflags=0x10202
cr0: 80050033<pg,wp,ne,et,mp,pe> cr4: 1626f8<smep,osxsav,pcide,vmxe,xmme,fxsr,pge,mce,pae,pse,de>
cr2: 0
cr3: 866674000
cr8: 0
rdi: fffffe594f495000 rsi: 6 rdx: 1
rcx: 0 r8: 0 r9: fffffe5953fde000
rax: 0 rbx: 2 rbp: fffffe007aedea90
r10: 6c r11: 0 r12: fffffe594f495000
r13: 16 r14: 100003 r15: 5a07
fsb: 0 gsb: fffffe59523ce000 ds: 4b
es: 4b fs: 0 gs: 1c3
trp: e err: 10 rip: 0
cs: 30 rfl: 10202 rsp: fffffe007aedea68
ss: 38
fffffe007aede870 unix:die+c6 ()
fffffe007aede960 unix:trap+11c1 ()
fffffe007aede970 unix:cmntrap+e9 ()
fffffe007aedea90 0 ()
fffffe007aedeac0 zfs:vdev_hold+3c ()
fffffe007aedeb00 zfs:spa_vdev_state_enter+81 ()
fffffe007aedeb40 zfs:dsl_scan+24 ()
fffffe007aedeb70 zfs:spa_scan+39 ()
fffffe007aedebc0 zfs:zfs_ioc_pool_scan+52 ()
fffffe007aedec70 zfs:zfsdev_ioctl+1b8 ()
fffffe007aedecb0 genunix:cdev_ioctl+2b ()
fffffe007aeded00 specfs:spec_ioctl+45 ()
fffffe007aeded90 genunix:fop_ioctl+5b ()
fffffe007aedeeb0 genunix:ioctl+153 ()
fffffe007aedef10 unix:brand_sys_sysenter+1dc ()
This is the vdev layout of the pool in question:
ADDR STATE AUX DESCRIPTION fffffe594f4a1000 HEALTHY - root fffffe594f49e000 HEALTHY - mirror fffffe594f49b000 HEALTHY - <mirror A> fffffe594f498000 HEALTHY - <mirror B> fffffe594f495000 HEALTHY - hole fffffe594f620000 HEALTHY - hole fffffe594f61d000 HEALTHY - <SLOG device>
The hole devices were of immediate interest, given how poorly other parts of the system have dealt with them (see: #12901)
Looking at the source for vdev_hold() and vdev_hole_ops, the issue becomes apparent:
void
vdev_hold(vdev_t *vd)
{
spa_t *spa = vd->vdev_spa;
ASSERT(spa_is_root(spa));
if (spa->spa_state == POOL_STATE_UNINITIALIZED)
return;
for (int c = 0; c < vd->vdev_children; c++)
vdev_hold(vd->vdev_child[c]);
if (vd->vdev_ops->vdev_op_leaf)
vd->vdev_ops->vdev_op_hold(vd);
}
vdev_ops_t vdev_hole_ops = {
.vdev_op_open = vdev_missing_open,
.vdev_op_close = vdev_missing_close,
.vdev_op_asize = vdev_default_asize,
.vdev_op_io_start = vdev_missing_io_start,
.vdev_op_io_done = vdev_missing_io_done,
.vdev_op_state_change = NULL,
.vdev_op_need_resilver = NULL,
.vdev_op_hold = NULL,
.vdev_op_rele = NULL,
.vdev_op_remap = NULL,
.vdev_op_xlate = NULL,
.vdev_op_dumpio = NULL,
.vdev_op_type = VDEV_TYPE_HOLE, /* name of this vdev type */
.vdev_op_leaf = B_TRUE /* leaf vdev */
};
Since vdev_hole_ops reports being a leaf, vdev_hold will unconditionally attempt to call vdev_op_hold. This made it seem like device removal (resulting in the hole vdevs) would have made scrubs impossible. Further investigation into the situation reveals that there's only the one vdev_hold caller:
void
spa_vdev_state_enter(spa_t *spa, int oplocks)
{
int locks = SCL_STATE_ALL | oplocks;
/*
* Root pools may need to read of the underlying devfs filesystem
* when opening up a vdev. Unfortunately if we're holding the
* SCL_ZIO lock it will result in a deadlock when we try to issue
* the read from the root filesystem. Instead we "prefetch"
* the associated vnodes that we need prior to opening the
* underlying devices and cache them so that we can prevent
* any I/O when we are doing the actual open.
*/
if (spa_is_root(spa)) {
int low = locks & ~(SCL_ZIO - 1);
int high = locks & ~low;
spa_config_enter(spa, high, spa, RW_WRITER);
vdev_hold(spa->spa_root_vdev);
spa_config_enter(spa, low, spa, RW_WRITER);
} else {
spa_config_enter(spa, locks, spa, RW_WRITER);
}
spa->spa_vdev_locks = locks;
}
So again, a problem only for root pools. It seems reasonable for vdev_hold (and vdev_rele) to call those respective vdev_ops handlers if they are non-NULL.
Updated by Patrick Mooney almost 5 years ago
- Related to Bug #12894: root zpool with SLOG should be bootable added
Updated by Patrick Mooney almost 5 years ago
Booting up a BE with this patch on the pool in question, I'm able to run a zpool scrub without the machine panicking. (The scrub completed without issue.)
Updated by Electric Monk almost 5 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit c65bd18728f347251dbeb256af17dbac88a50e8c
commit c65bd18728f347251dbeb256af17dbac88a50e8c
Author: Patrick Mooney <pmooney@pfmooney.com>
Date: 2020-07-27T17:39:07.000Z
12981 vdev_hold prone to NULL dereference
Reviewed by: Andy Fiddaman <andy@omniosce.org>
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Dan McDonald <danmcd@joyent.com>