Project

General

Profile

Bug #10623

ZFS should be more aggressive in updating vdev devid

Added by Joshua M. Clulow over 1 year ago. Updated 11 months ago.

Status:
Closed
Priority:
Normal
Category:
zfs - Zettabyte File System
Start date:
2019-03-31
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Under some conditions, ZFS will check to see if a disk vdev has an updated devid when opening the vdev. The code currently maintains two copies of the devid: the string version from the vdev configuration, in vdev_devid; and a binary version in the disk-specific vdev data, split between vd_devid and vd_minor. It's not completely clear that these are correctly kept in sync, and the code does not handle at all the case where a device no longer has a devid.

It's somewhat unusual for a devid to change, but there are at least two cases where it might:

  • a storage driver had a bug (e.g., #10012) where invalid devids were produced and stored in pools, but the bug is corrected by either no longer producing a devid or producing a correct but different one
  • a storage driver is completely replaced; e.g., when Joyent is able to upstream smrt, a replacement for cpqary3, some HP systems will be using a different driver for Smart Array controllers, and some change in devids may result

The case where a device no longer provides a devid is particularly important for #10622. If the devid is not removed from the configuration, we will not be able to correctly fall back to using the physical path to update the /dev path for a vdev.


Related issues

Related to illumos gate - Bug #10012: vioblk should not accept an all-zero serial numberClosed2018-11-28

Actions
Related to illumos gate - Bug #10622: ZFS should still check paths for devices that have no devidClosed2019-03-31

Actions

History

#1

Updated by Joshua M. Clulow over 1 year ago

  • Description updated (diff)
#2

Updated by Joshua M. Clulow over 1 year ago

  • Related to Bug #10012: vioblk should not accept an all-zero serial number added
#3

Updated by Joshua M. Clulow over 1 year ago

  • Description updated (diff)
#4

Updated by Joshua M. Clulow over 1 year ago

  • Related to Bug #10622: ZFS should still check paths for devices that have no devid added
#5

Updated by Joshua M. Clulow over 1 year ago

  • Subject changed from ZFS should be more aggresive in updating vdev devid to ZFS should be more aggressive in updating vdev devid
#6

Updated by Joshua M. Clulow 11 months ago

Testing Notes

In an OpenIndiana VM (running under SmartOS KVM) with rpool backed by a vioblk virtual disk, I first checked to see the original (bogus) devid in the pool:

# zdb rpool | head -500 | egrep 'tion:|phys_path|devid'
Cached configuration:
                devid: 'id1,kdev@A~~~~~~~~~~~~~~~~~~~~/a'
                phys_path: '/pci@0,0/pci1af4,2@4/blkdev@0,0:a'
MOS Configuration:
                devid: 'id1,kdev@A~~~~~~~~~~~~~~~~~~~~/a'
                phys_path: '/pci@0,0/pci1af4,2@4/blkdev@0,0:a'

I then installed the updated bits and rebooted. On the console at startup was the message:

NOTICE: vdev_disk_open /dev/dsk/c2t0d0s0: update devid from 'id1,kdev@A~~~~~~~~~~~~~~~~~~~~/a' to '<none>'

As of the integration of #10012, this device now correctly has no devid:

# zdb rpool | head -500 | egrep 'tion:|phys_path|devid'
Cached configuration:
                phys_path: '/pci@0,0/pci1af4,2@4/blkdev@0,0:a'
MOS Configuration:
                phys_path: '/pci@0,0/pci1af4,2@4/blkdev@0,0:a'

In an OmniOS VM (running under SmartOS bhyve) with rpool backed by a vioblk virtual disk, I confirmed that the devid was unchanged before and after:

# zdb rpool | head -500 | egrep 'tion:|phys_path|devid'
Cached configuration:
                devid: 'id1,kdev@ABHYVE-5164-2C5D-3DDB/b'
                phys_path: '/pci@0,0/pci1af4,2@4/blkdev@0,0:b'
MOS Configuration:
                devid: 'id1,kdev@ABHYVE-5164-2C5D-3DDB/b'
                phys_path: '/pci@0,0/pci1af4,2@4/blkdev@0,0:b'

(Note, our bhyve does provide a serial number to Virtio block devices)

I ran the ZFS test suite in the OmniOS VM both before and after the change. There were a handful of failures (~25), but they were the same failures before and after the change was applied to the system so I do not believe they are related.

#7

Updated by Electric Monk 11 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 6af23589e78469fc9f90db8558854d1a822aaa72

commit  6af23589e78469fc9f90db8558854d1a822aaa72
Author: Joshua M. Clulow <josh@sysmgr.org>
Date:   2019-08-20T17:05:19.000Z

    10623 ZFS should be more aggressive in updating vdev devid
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Gordon Ross <gwr@nexenta.com>
    Reviewed by: Andy Fiddaman <omnios@citrus-it.co.uk>
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Approved by: Robert Mustacchi <rm@joyent.com>

Also available in: Atom PDF