Project

General

Profile

Bug #12981

vdev_hold prone to NULL dereference

Added by Patrick Mooney 3 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Category:
zfs - Zettabyte File System
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

I went to scrub my root pool which had been subject to testing as part of #12894. (It was a root pool that had undergone SLOG addition and removal.) Immediately upon starting the scrub, the machine panicked in ZFS:

zpool:
#pf Page fault
Bad kernel fault at addr=0x0
pid=499, pc=0x0, sp=0xfffffe007aedea68, eflags=0x10202
cr0: 80050033<pg,wp,ne,et,mp,pe>  cr4: 1626f8<smep,osxsav,pcide,vmxe,xmme,fxsr,pge,mce,pae,pse,de>
cr2: 0
cr3: 866674000
cr8: 0

        rdi: fffffe594f495000 rsi:                6 rdx:                1
        rcx:                0  r8:                0  r9: fffffe5953fde000
        rax:                0 rbx:                2 rbp: fffffe007aedea90
        r10:               6c r11:                0 r12: fffffe594f495000
        r13:               16 r14:           100003 r15:             5a07
        fsb:                0 gsb: fffffe59523ce000  ds:               4b
         es:               4b  fs:                0  gs:              1c3
        trp:                e err:               10 rip:                0
         cs:               30 rfl:            10202 rsp: fffffe007aedea68
         ss:               38

fffffe007aede870 unix:die+c6 ()
fffffe007aede960 unix:trap+11c1 ()
fffffe007aede970 unix:cmntrap+e9 ()
fffffe007aedea90 0 ()
fffffe007aedeac0 zfs:vdev_hold+3c ()
fffffe007aedeb00 zfs:spa_vdev_state_enter+81 ()
fffffe007aedeb40 zfs:dsl_scan+24 ()
fffffe007aedeb70 zfs:spa_scan+39 ()
fffffe007aedebc0 zfs:zfs_ioc_pool_scan+52 ()
fffffe007aedec70 zfs:zfsdev_ioctl+1b8 ()
fffffe007aedecb0 genunix:cdev_ioctl+2b ()
fffffe007aeded00 specfs:spec_ioctl+45 ()
fffffe007aeded90 genunix:fop_ioctl+5b ()
fffffe007aedeeb0 genunix:ioctl+153 ()
fffffe007aedef10 unix:brand_sys_sysenter+1dc ()

This is the vdev layout of the pool in question:

ADDR             STATE     AUX          DESCRIPTION
fffffe594f4a1000 HEALTHY   -            root
fffffe594f49e000 HEALTHY   -              mirror
fffffe594f49b000 HEALTHY   -                <mirror A>
fffffe594f498000 HEALTHY   -                <mirror B>
fffffe594f495000 HEALTHY   -              hole
fffffe594f620000 HEALTHY   -              hole
fffffe594f61d000 HEALTHY   -              <SLOG device>

The hole devices were of immediate interest, given how poorly other parts of the system have dealt with them (see: #12901)

Looking at the source for vdev_hold() and vdev_hole_ops, the issue becomes apparent:

void
vdev_hold(vdev_t *vd)
{
        spa_t *spa = vd->vdev_spa;

        ASSERT(spa_is_root(spa));
        if (spa->spa_state == POOL_STATE_UNINITIALIZED)
                return;

        for (int c = 0; c < vd->vdev_children; c++)
                vdev_hold(vd->vdev_child[c]);

        if (vd->vdev_ops->vdev_op_leaf)
                vd->vdev_ops->vdev_op_hold(vd);
}
vdev_ops_t vdev_hole_ops = {
        .vdev_op_open = vdev_missing_open,
        .vdev_op_close = vdev_missing_close,
        .vdev_op_asize = vdev_default_asize,
        .vdev_op_io_start = vdev_missing_io_start,
        .vdev_op_io_done = vdev_missing_io_done,
        .vdev_op_state_change = NULL,
        .vdev_op_need_resilver = NULL,
        .vdev_op_hold = NULL,
        .vdev_op_rele = NULL,
        .vdev_op_remap = NULL,
        .vdev_op_xlate = NULL,
        .vdev_op_dumpio = NULL,
        .vdev_op_type = VDEV_TYPE_HOLE,         /* name of this vdev type */
        .vdev_op_leaf = B_TRUE                  /* leaf vdev */
};

Since vdev_hole_ops reports being a leaf, vdev_hold will unconditionally attempt to call vdev_op_hold. This made it seem like device removal (resulting in the hole vdevs) would have made scrubs impossible. Further investigation into the situation reveals that there's only the one vdev_hold caller:

void
spa_vdev_state_enter(spa_t *spa, int oplocks)
{
        int locks = SCL_STATE_ALL | oplocks;

        /*
         * Root pools may need to read of the underlying devfs filesystem
         * when opening up a vdev.  Unfortunately if we're holding the
         * SCL_ZIO lock it will result in a deadlock when we try to issue
         * the read from the root filesystem.  Instead we "prefetch" 
         * the associated vnodes that we need prior to opening the
         * underlying devices and cache them so that we can prevent
         * any I/O when we are doing the actual open.
         */
        if (spa_is_root(spa)) {
                int low = locks & ~(SCL_ZIO - 1);
                int high = locks & ~low;

                spa_config_enter(spa, high, spa, RW_WRITER);
                vdev_hold(spa->spa_root_vdev);
                spa_config_enter(spa, low, spa, RW_WRITER);
        } else {
                spa_config_enter(spa, locks, spa, RW_WRITER);
        }
        spa->spa_vdev_locks = locks;
}

So again, a problem only for root pools. It seems reasonable for vdev_hold (and vdev_rele) to call those respective vdev_ops handlers if they are non-NULL.


Related issues

Related to illumos gate - Bug #12894: root zpool with SLOG should be bootableClosed

Actions

History

#1

Updated by Patrick Mooney 3 months ago

  • Related to Bug #12894: root zpool with SLOG should be bootable added
#2

Updated by Electric Monk 3 months ago

  • Gerrit CR set to 810
#3

Updated by Patrick Mooney 3 months ago

Booting up a BE with this patch on the pool in question, I'm able to run a zpool scrub without the machine panicking. (The scrub completed without issue.)

#4

Updated by Electric Monk 3 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit c65bd18728f347251dbeb256af17dbac88a50e8c

commit  c65bd18728f347251dbeb256af17dbac88a50e8c
Author: Patrick Mooney <pmooney@pfmooney.com>
Date:   2020-07-27T17:39:07.000Z

    12981 vdev_hold prone to NULL dereference
    Reviewed by: Andy Fiddaman <andy@omniosce.org>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Igor Kozhukhov <igor@dilos.org>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF