Project

General

Profile

Actions

Bug #14593

closed

zpool online -e hangs forever after 14022

Added by Joshua M. Clulow 5 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Immediate
Assignee:
-
Category:
driver - device drivers
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

#14022 added a DKIOSTATE ioctl to the zpool online -e path, which appears now to reliably hang at least on AWS systems using the xdf driver. Looking at a stuck zpool online -e process:

# pargs 548
548:    zpool online -e rpool c1t0d0
argv[0]: zpool
argv[1]: online
argv[2]: -e
argv[3]: rpool
argv[4]: c1t0d0

# mdb -p 548
Loading modules: [ ld.so.1 libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1 libnvpair.so.1 ]
> $C
fffffc7fffdf2ca0 libc.so.1`ioctl+0xa()
fffffc7fffdf4310 libzfs.so.1`zpool_vdev_online+0x2d0(c3bdd0, fffffc7fffdf864e, 8, fffffc7fffdf4334)
fffffc7fffdf4380 zpool_do_online+0x135(4, fffffc7fffdf8440)
fffffc7fffdf83e0 main+0xdf(5, fffffc7fffdf8438)
fffffc7fffdf8410 _start_crt+0x87()
fffffc7fffdf8420 _start+0x18()
> libzfs.so.1`zpool_vdev_online+0x2d0::dis -n 1
libzfs.so.1`zpool_vdev_online+0x2cb:    call   -0x460   <libzfs.so.1`zpool_relabel_disk>
libzfs.so.1`zpool_vdev_online+0x2d0:    jmp    -0x18e   <libzfs.so.1`zpool_vdev_online+0x147>
libzfs.so.1`zpool_vdev_online+0x2d5:    nopl   (%rax)

In the kernel:

> 0t548::pid2proc | ::walk thread | ::findstack -v
stack pointer for thread fffffe0389a2e060 (zpool/1): fffffe0006af8af0
[ fffffe0006af8af0 _resume_from_idle+0x12b() ]
  fffffe0006af8b20 swtch+0x133()
  fffffe0006af8b90 cv_wait_sig+0x171(fffffe03860feef8, fffffe03860fee40)
  fffffe0006af8bd0 xdf_dkstate+0x73(fffffe03860feb80, 0)
  fffffe0006af8cc0 xdf_ioctl+0x4f6(f700000000, 40d, fffffc7fffdf244c, 202007, fffffe0390bcf970, fffffe0006af8e18)
  fffffe0006af8d00 cdev_ioctl+0x2b(f700000000, 40d, fffffc7fffdf244c, 202007, fffffe0390bcf970, fffffe0006af8e18)
  fffffe0006af8d50 spec_ioctl+0x45(fffffe0391d45e40, 40d, fffffc7fffdf244c, 202007, fffffe0390bcf970, fffffe0006af8e18, 0)
  fffffe0006af8de0 fop_ioctl+0x5b(fffffe0391d45e40, 40d, fffffc7fffdf244c, 202007, fffffe0390bcf970, fffffe0006af8e18, 0)
  fffffe0006af8f00 ioctl+0x153(7, 40d, fffffc7fffdf244c)
  fffffe0006af8f10 sys_syscall+0x1a8()

> ::prtconf -d xdf | ::devinfo -d | ::print xdf_t xdf_state xdf_xdev_nblocks xdf_xdev_secsize xdf_mstate xdf_dinfo
xdf_state = 3 (XD_READY)
xdf_xdev_nblocks = 0x6400000
xdf_xdev_secsize = 0x200
xdf_mstate = 0 (DKIO_NONE)
xdf_dinfo = 0

It would seem that, for a non-removal drive (xdf_dinfo has neither VDISK_CDROM nor VDISK_REMOVABLE) the xdf_mstate value remains DKIO_NONE forever.

It is not clear to me, at least from reading dkio(4I), what DKIOCSTATE is supposed to mean for drives that are not removable:

     DKIOCSTATE

        This ioctl(2) blocks until the state of the drive, inserted or
        ejected, is changed.  The argument is a pointer to a dkio_state, enum,
        whose possible enumerations are listed below.  The initial value
        should be either the last reported state of the drive, or DKIO_NONE.
        Upon return, the enum pointed to by the argument is updated with the
        current state of the drive.

          enum dkio_state {
              DKIO_NONE,          /* Return disk's current state */
              DKIO_EJECTED,       /* Disk state is 'ejected' */
              DKIO_INSERTED       /* Disk state is 'inserted' */
          };

Note that this documentation is missing the DKIO_DEV_GONE variant which appears in uts/common/sys/dkio.h. In that header, DKIOCSTATE is explicitly mentioned as being for removable devices:

/*
 * The following ioctl's are removable media support
 */
#define DKIOCLOCK       (DKIOC|7)       /* Generic 'lock' */
#define DKIOCUNLOCK     (DKIOC|8)       /* Generic 'unlock' */
#define DKIOCSTATE      (DKIOC|13)      /* Inquire insert/eject state */
#define DKIOCREMOVABLE  (DKIOC|16)      /* is media removable */

It seems that a common pattern for disk drivers is initialising their insertion state to DKIO_NONE, then shortly afterwards changing to DKIO_INSERTED, at least for some types of media. This is what blkdev does (and thus nvme and vioblk), and roughly what sd does as well. In other cases:

  • lofi is willing to return DKIO_INSERTED or DKIO_DEV_GONE depending on whether the device has been forcefully unmapped since having been installed, and appears to correctly block waiting for change, and correctly handle DKIO_NONE
  • ramdisk always returns DKIO_INSERTED no matter what is going on, which does not seem to be correct -- it should block forever if one passes in DKIO_INSERTED
  • xdf, at issue here, does not correctly handle returning the current state on DKIO_NONE, at least on account of never moving away from that state for non-removable drives
  • cmdk (IDE) seems like it will just fail a DKIOCSTATE request with ENXIO if the drive is not removable (dad_rmb), but if the drive is removable and does not report no media, it seems like ata_disk_state() might leave it with DKIO_NONE as well
  • zvols probably always fail with ENOTTY

Suffice it to say, outside of blkdev, implementations of DKIOCSTATE are a mixed bag and I'm not even sure exactly how it is intended to work -- should it fail when things are not removable, or should it always pretend that the disk is DKIO_INSERTED? Is there not a more explicit way we can refresh the label if that's what is needed for #14022 ?


Related issues

Related to illumos gate - Bug #14022: zpool online -e breaks access to poolClosedAndy Fiddaman

Actions
Related to illumos gate - Feature #14600: dkio(4I) needs an ioctl to re-read the disk labelNew

Actions
Actions #1

Updated by Joshua M. Clulow 5 months ago

  • Related to Bug #14022: zpool online -e breaks access to pool added
Actions #2

Updated by Joshua M. Clulow 5 months ago

Hunting around a bit more, it seems like it would be more appropriate to introduce a new ioctl here, e.g., DKIOCHECKLABEL, instead of trying to rely on the side effects of DKIOCSTATE.

Disk drivers would generally just pass this new ioctl on to cmlb_ioctl() as they would with DKIOCGETEFI etc. Once there, we would call into cmlb_validate_geometry() which should get us to cmlb_use_efi() which will eventually set cl->cl_f_geometry_is_valid = B_TRUE. If geometry validation succeeds, the ioctl would succeed without taking further action. On disk drivers that do not know to pass the ioctl through to cmlb, it should generally fail as any new ioctl would, with ENOTTY.

Actions #3

Updated by Electric Monk 5 months ago

  • Gerrit CR set to 2079
Actions #4

Updated by Joshua M. Clulow 5 months ago

I've decided to do something more surgical that shouldn't require extensive testing first, to stop the immediate breakage. I'll file a follow-up bug for the new ioctl, etc.

Actions #5

Updated by Joshua M. Clulow 5 months ago

Testing Notes

I did an expansion in a QEMU guest, with a disk I had recently expanded by 200MB, with a vioblk disk:

root@unknown:~# zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
rpool  21.6G  4.11G  17.5G        -      128M      -    19%  1.00x  ONLINE  -
root@unknown:~# truss -o /var/tmp/try1 -f zpool online -e rpool c1t0d0
root@unknown:~# zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
rpool  21.9G  4.11G  17.8G        -         -      -    18%  1.00x  ONLINE  -
root@unknown:~# zpool status
  pool: rpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          c1t0d0    ONLINE       0     0     0

errors: No known data errors

You can see that it checked for an acceptable driver and performed the DKIOCSTATE ioctl on cue:

967:    ioctl(7, DKIOCGETEFI, 0xFFFFFC7FFFDF9D00)       = 0
967:    ioctl(7, DKIOCSETEFI, 0xFFFFFC7FFFDF9D00)       = 0
967:    fstat(7, 0xFFFFFC7FFFDFA0D0)                    = 0
967:    modctl(MODGETNAME, 0xFFFFFC7FFFDFA150, 0x00000020, 0xFFFFFC7FFFDFA0C8, 0x00000540) = 0
967:    ioctl(7, DKIOCSTATE, 0xFFFFFC7FFFDFA0CC)        = 0
967:    close(7)                                        = 0
967:    ioctl(3, ZFS_IOC_VDEV_SET_STATE, 0xFFFFFC7FFFDFA620) = 0
967:    ioctl(3, ZFS_IOC_LOG_HISTORY, 0xFFFFFC7FFFDFAAC0) = 0
967:    close(3)                                        = 0
967:    lseek(4, 0, SEEK_CUR)                           = 0
967:    close(4)                                        = 0
967:    lseek(5, 0, SEEK_CUR)                           = 0
967:    close(5)                                        = 0
967:    close(6)                                        = 0
967:    _exit(0)

I put the patched libzfs.so.1 into the AWS guest that was previously hanging, and one can see that it now does the modctl() and gives up:

...
626:    ioctl(7, DKIOCGETEFI, 0xFFFFFC7FFFDEFBB0)       = 0
626:    ioctl(7, DKIOCGETEFI, 0xFFFFFC7FFFDEFBB0)       = 0
626:    fstat(7, 0xFFFFFC7FFFDEFD40)                    = 0
626:    modctl(MODGETNAME, 0xFFFFFC7FFFDEFDC0, 0x00000020, 0xFFFFFC7FFFDEFD38, 0x00000000) = 0
626:    close(7)                                        = 0
626:    ioctl(3, ZFS_IOC_VDEV_SET_STATE, 0xFFFFFC7FFFDF0290) = 0
626:    ioctl(3, ZFS_IOC_LOG_HISTORY, 0xFFFFFC7FFFDF0730) = 0
626:    close(3)                                        = 0
626:    lseek(4, 0, SEEK_CUR)                           = 0
626:    close(4)                                        = 0
626:    lseek(5, 0, SEEK_CUR)                           = 0
626:    close(5)                                        = 0
626:    close(6)                                        = 0
626:    _exit(0)
Actions #6

Updated by Joshua M. Clulow 5 months ago

  • Related to Feature #14600: dkio(4I) needs an ioctl to re-read the disk label added
Actions #7

Updated by Electric Monk 5 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 7d6cab3f0d6a4cea17d062d3a93e223b15de705c

commit  7d6cab3f0d6a4cea17d062d3a93e223b15de705c
Author: Joshua M. Clulow <josh@sysmgr.org>
Date:   2022-03-28T23:44:42.000Z

    14593 zpool online -e hangs forever after 14022
    Reviewed by: Andy Fiddaman <andy@omnios.org>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Actions

Also available in: Atom PDF