Bug #14593
closedzpool online -e hangs forever after 14022
100%
Description
#14022 added a DKIOSTATE
ioctl to the zpool online -e
path, which appears now to reliably hang at least on AWS systems using the xdf
driver. Looking at a stuck zpool online -e
process:
# pargs 548 548: zpool online -e rpool c1t0d0 argv[0]: zpool argv[1]: online argv[2]: -e argv[3]: rpool argv[4]: c1t0d0 # mdb -p 548 Loading modules: [ ld.so.1 libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1 libnvpair.so.1 ] > $C fffffc7fffdf2ca0 libc.so.1`ioctl+0xa() fffffc7fffdf4310 libzfs.so.1`zpool_vdev_online+0x2d0(c3bdd0, fffffc7fffdf864e, 8, fffffc7fffdf4334) fffffc7fffdf4380 zpool_do_online+0x135(4, fffffc7fffdf8440) fffffc7fffdf83e0 main+0xdf(5, fffffc7fffdf8438) fffffc7fffdf8410 _start_crt+0x87() fffffc7fffdf8420 _start+0x18() > libzfs.so.1`zpool_vdev_online+0x2d0::dis -n 1 libzfs.so.1`zpool_vdev_online+0x2cb: call -0x460 <libzfs.so.1`zpool_relabel_disk> libzfs.so.1`zpool_vdev_online+0x2d0: jmp -0x18e <libzfs.so.1`zpool_vdev_online+0x147> libzfs.so.1`zpool_vdev_online+0x2d5: nopl (%rax)
In the kernel:
> 0t548::pid2proc | ::walk thread | ::findstack -v stack pointer for thread fffffe0389a2e060 (zpool/1): fffffe0006af8af0 [ fffffe0006af8af0 _resume_from_idle+0x12b() ] fffffe0006af8b20 swtch+0x133() fffffe0006af8b90 cv_wait_sig+0x171(fffffe03860feef8, fffffe03860fee40) fffffe0006af8bd0 xdf_dkstate+0x73(fffffe03860feb80, 0) fffffe0006af8cc0 xdf_ioctl+0x4f6(f700000000, 40d, fffffc7fffdf244c, 202007, fffffe0390bcf970, fffffe0006af8e18) fffffe0006af8d00 cdev_ioctl+0x2b(f700000000, 40d, fffffc7fffdf244c, 202007, fffffe0390bcf970, fffffe0006af8e18) fffffe0006af8d50 spec_ioctl+0x45(fffffe0391d45e40, 40d, fffffc7fffdf244c, 202007, fffffe0390bcf970, fffffe0006af8e18, 0) fffffe0006af8de0 fop_ioctl+0x5b(fffffe0391d45e40, 40d, fffffc7fffdf244c, 202007, fffffe0390bcf970, fffffe0006af8e18, 0) fffffe0006af8f00 ioctl+0x153(7, 40d, fffffc7fffdf244c) fffffe0006af8f10 sys_syscall+0x1a8() > ::prtconf -d xdf | ::devinfo -d | ::print xdf_t xdf_state xdf_xdev_nblocks xdf_xdev_secsize xdf_mstate xdf_dinfo xdf_state = 3 (XD_READY) xdf_xdev_nblocks = 0x6400000 xdf_xdev_secsize = 0x200 xdf_mstate = 0 (DKIO_NONE) xdf_dinfo = 0
It would seem that, for a non-removal drive (xdf_dinfo
has neither VDISK_CDROM
nor VDISK_REMOVABLE
) the xdf_mstate
value remains DKIO_NONE
forever.
It is not clear to me, at least from reading dkio(4I)
, what DKIOCSTATE
is supposed to mean for drives that are not removable:
DKIOCSTATE This ioctl(2) blocks until the state of the drive, inserted or ejected, is changed. The argument is a pointer to a dkio_state, enum, whose possible enumerations are listed below. The initial value should be either the last reported state of the drive, or DKIO_NONE. Upon return, the enum pointed to by the argument is updated with the current state of the drive. enum dkio_state { DKIO_NONE, /* Return disk's current state */ DKIO_EJECTED, /* Disk state is 'ejected' */ DKIO_INSERTED /* Disk state is 'inserted' */ };
Note that this documentation is missing the DKIO_DEV_GONE
variant which appears in uts/common/sys/dkio.h
. In that header, DKIOCSTATE
is explicitly mentioned as being for removable devices:
/* * The following ioctl's are removable media support */ #define DKIOCLOCK (DKIOC|7) /* Generic 'lock' */ #define DKIOCUNLOCK (DKIOC|8) /* Generic 'unlock' */ #define DKIOCSTATE (DKIOC|13) /* Inquire insert/eject state */ #define DKIOCREMOVABLE (DKIOC|16) /* is media removable */
It seems that a common pattern for disk drivers is initialising their insertion state to DKIO_NONE
, then shortly afterwards changing to DKIO_INSERTED
, at least for some types of media. This is what blkdev
does (and thus nvme
and vioblk
), and roughly what sd
does as well. In other cases:
lofi
is willing to returnDKIO_INSERTED
orDKIO_DEV_GONE
depending on whether the device has been forcefully unmapped since having been installed, and appears to correctly block waiting for change, and correctly handleDKIO_NONE
ramdisk
always returnsDKIO_INSERTED
no matter what is going on, which does not seem to be correct -- it should block forever if one passes inDKIO_INSERTED
xdf
, at issue here, does not correctly handle returning the current state onDKIO_NONE
, at least on account of never moving away from that state for non-removable drivescmdk
(IDE) seems like it will just fail aDKIOCSTATE
request withENXIO
if the drive is not removable (dad_rmb
), but if the drive is removable and does not report no media, it seems likeata_disk_state()
might leave it withDKIO_NONE
as well- zvols probably always fail with
ENOTTY
Suffice it to say, outside of blkdev
, implementations of DKIOCSTATE
are a mixed bag and I'm not even sure exactly how it is intended to work -- should it fail when things are not removable, or should it always pretend that the disk is DKIO_INSERTED
? Is there not a more explicit way we can refresh the label if that's what is needed for #14022 ?
Related issues
Updated by Joshua M. Clulow 5 months ago
- Related to Bug #14022: zpool online -e breaks access to pool added
Updated by Joshua M. Clulow 5 months ago
Hunting around a bit more, it seems like it would be more appropriate to introduce a new ioctl here, e.g., DKIOCHECKLABEL
, instead of trying to rely on the side effects of DKIOCSTATE
.
Disk drivers would generally just pass this new ioctl on to cmlb_ioctl()
as they would with DKIOCGETEFI
etc. Once there, we would call into cmlb_validate_geometry()
which should get us to cmlb_use_efi()
which will eventually set cl->cl_f_geometry_is_valid = B_TRUE
. If geometry validation succeeds, the ioctl would succeed without taking further action. On disk drivers that do not know to pass the ioctl through to cmlb
, it should generally fail as any new ioctl would, with ENOTTY
.
Updated by Joshua M. Clulow 5 months ago
I've decided to do something more surgical that shouldn't require extensive testing first, to stop the immediate breakage. I'll file a follow-up bug for the new ioctl, etc.
Updated by Joshua M. Clulow 5 months ago
Testing Notes¶
I did an expansion in a QEMU guest, with a disk I had recently expanded by 200MB, with a vioblk
disk:
root@unknown:~# zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT rpool 21.6G 4.11G 17.5G - 128M - 19% 1.00x ONLINE - root@unknown:~# truss -o /var/tmp/try1 -f zpool online -e rpool c1t0d0 root@unknown:~# zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT rpool 21.9G 4.11G 17.8G - - - 18% 1.00x ONLINE - root@unknown:~# zpool status pool: rpool state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 errors: No known data errors
You can see that it checked for an acceptable driver and performed the DKIOCSTATE
ioctl on cue:
967: ioctl(7, DKIOCGETEFI, 0xFFFFFC7FFFDF9D00) = 0 967: ioctl(7, DKIOCSETEFI, 0xFFFFFC7FFFDF9D00) = 0 967: fstat(7, 0xFFFFFC7FFFDFA0D0) = 0 967: modctl(MODGETNAME, 0xFFFFFC7FFFDFA150, 0x00000020, 0xFFFFFC7FFFDFA0C8, 0x00000540) = 0 967: ioctl(7, DKIOCSTATE, 0xFFFFFC7FFFDFA0CC) = 0 967: close(7) = 0 967: ioctl(3, ZFS_IOC_VDEV_SET_STATE, 0xFFFFFC7FFFDFA620) = 0 967: ioctl(3, ZFS_IOC_LOG_HISTORY, 0xFFFFFC7FFFDFAAC0) = 0 967: close(3) = 0 967: lseek(4, 0, SEEK_CUR) = 0 967: close(4) = 0 967: lseek(5, 0, SEEK_CUR) = 0 967: close(5) = 0 967: close(6) = 0 967: _exit(0)
I put the patched libzfs.so.1
into the AWS guest that was previously hanging, and one can see that it now does the modctl()
and gives up:
... 626: ioctl(7, DKIOCGETEFI, 0xFFFFFC7FFFDEFBB0) = 0 626: ioctl(7, DKIOCGETEFI, 0xFFFFFC7FFFDEFBB0) = 0 626: fstat(7, 0xFFFFFC7FFFDEFD40) = 0 626: modctl(MODGETNAME, 0xFFFFFC7FFFDEFDC0, 0x00000020, 0xFFFFFC7FFFDEFD38, 0x00000000) = 0 626: close(7) = 0 626: ioctl(3, ZFS_IOC_VDEV_SET_STATE, 0xFFFFFC7FFFDF0290) = 0 626: ioctl(3, ZFS_IOC_LOG_HISTORY, 0xFFFFFC7FFFDF0730) = 0 626: close(3) = 0 626: lseek(4, 0, SEEK_CUR) = 0 626: close(4) = 0 626: lseek(5, 0, SEEK_CUR) = 0 626: close(5) = 0 626: close(6) = 0 626: _exit(0)
Updated by Joshua M. Clulow 5 months ago
- Related to Feature #14600: dkio(4I) needs an ioctl to re-read the disk label added
Updated by Electric Monk 5 months ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit 7d6cab3f0d6a4cea17d062d3a93e223b15de705c
commit 7d6cab3f0d6a4cea17d062d3a93e223b15de705c Author: Joshua M. Clulow <josh@sysmgr.org> Date: 2022-03-28T23:44:42.000Z 14593 zpool online -e hangs forever after 14022 Reviewed by: Andy Fiddaman <andy@omnios.org> Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Dan McDonald <danmcd@joyent.com>