Feature #11698
Want NVMe Hotplug Support
0%
Description
This is to track upstreaming the following issue from illumos-joyent:
OS-5553 Want NVMe Hotplug Support
For details, rationale and testing notes, please refer to the original SmartOS issue:
Updated by Rob Johnston over 1 year ago
This change improves support for hotplugging of NVMe devices, including both coordinated hotplug (via cfgadm(1M)) and surprise removal (physical pull of the disk without notification to the OS).
Note: To use cfgadm, which is needed for coordinated removal and also for onlining a drive after surprise removal, one must enable the hotplug service:
# svcadm enable hotplug
Issues Addressed
There are several issues tackled in this change. In particular:
A device removed via surprise removal with ongoing I/O via dd(1M) would hang waiting for I/O in the kernel. The fix for this is implementing DDI removal event callbacks in the nvme driver, as well as plumbing up event generation support via its nexus driver npe and posting the removal events when a hotplug interrupt for device removal was received. Now, the dd process will exit cleanly with EIO.
A device in a zpool removed via surprise removal would result in a subsequent zpool status on the zpool to hang waiting on I/O for the disk. The normal mechanism for ZFS to hear about removal of devices is through LDI notifications. Once other issues found while testing hotplug scenarios were fixed, this issue now works. (No fix for this particular issue was needed.)
A device that was removed through surprise removal, plugged back in, configured, and pulled a second time would panic the system. The panic came from the kernel code scanning for PCI fabric errors, a process kicked off by the parent PCIe bridge when the bridge received a PCIe hotplug interupt that its slot status had changed, from having a device present to being empty. The fabric scanning code would find a PCIe Completion Timeout Error, an Uncorrectable Error in PCIe Advanced Error Reporting terms, for the missing device and panic. It is expected that an absent device would not respond to a PCIe command, so panicking is not the correct response here.
For the fix, when the PCI register state is gathered in this code, we also gather its parent slot registers, which indicate whether a device is present. If a CTO error is found, we check the parent slot registers to see if there is a device in the slot. If the device is gone, we don't panic, but an ereport is still generated as before. This ereport also includes the parent slot register data if gathered.
A device removed through surprise removal, but later plugged back in to the same slot, would be marked retired. This makes it impossible to attach to the device without first resolving the FM case through fmadm repair. The reason the device was retired was that the ereport generated by the fabric scanning code would run through the "fabric-xlate" fm plugin, which would generate a new ereport for the device. The "eft" module would fault the device based on the ereport from "fabric-xlate", and the "io-retire" module would retire the device based on the fault. The fix is to not generate a new ereport from the "fabric-xlate" plugin if we know the device saw a CTO error and its parent slot reports the slot is empty. This information is available in the ereport generated by the PCI fabric scanning code, so we can know it is safe to not propagate more ereports if this is true.
Updated by Rob Johnston over 1 year ago
The following testing notes were excerpted verbatim from https://smartos.org/bugview/OS-5553
Testing Notes¶
Most testing was done on a debug build. I also built a non-debug build and did a surprise removal test on my test system as a sanity check. All testing here was done on the "nvme" system in the SF lab, which has 4 nvme drives.
Testing scenarios below are organized into the following categories:- Demonstrating coordinated removal using cfgadm(1M): with and without ongoing I/O, in a zpool or not in a zpool
- Demonstrating surprise removal: with and without ongoing I/O, in a zpool or not in a zpool
- Demonstrating ereports and sysevents generated for hotplug: in coordinated removal and surprise removal
- Occupant configured (Receptacle connected): This means the device is onlined and ready to use. The receptacle state is redundant here and sometimes omitted from the notes; for a device to be onlined, its receptacle must be powered on ("connected").
- Occupant unconfigured, Receptacle connected: This means the slot is powered on, but the device is not onlined.
- (Occupant unconfigured) Receptacle disconnected: This means the slot is powered off. This implies the deivce must be offline, as it cannot be online without the slot powered on. Sometimes the Occupant state for a disconnected Receptacle is omitted in the testing notes.
- (Occupant unconfigured) Receptacle empty: This means the device is physically gone from the slot. The slot is necessarily disconnected, and the device is necessarily offline, since there is no device in the slot. Sometimes the Occupant state for an empty Receptacle is omitted in the testing notes.
To verify the state of the slot, we use the Receptacle and Occupant states as reported by cfgadm(1M) as a sanity check. We also look at the number of nvme instances attached in the system, using the mdb dcmd ::prtconf to see if a driver was attached or detached.
Coordinated Removal using cfgadm(1M)
These tests demonstrate coordinated removal for a device with or without ongoing I/O, in or not in a zpool, from several possible Occupant/Receptacle states.
1. Coordinated Removal (no ongoing I/O, not in a zpool)
(a) Occupant configured -> Receptacle connected
This is done via cfgadm -c unconfigure <slot>.
# verify slot status: occupant configured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok # verify all 4 instances of nvme driver exist (4/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe96308841a8 pciexclass,010802, instance #1 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme) # unconfigure slot [root@nvme ~]# cfgadm -c unconfigure Slot12 # verify only 3 nvme instances exist (3/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme)
(b) Occupant configured -> Receptacle disconnected
This is done via cfgadm -c disconnect <slot>.
# verify slot status: occupant configured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok # verify all 4 instances of nvme driver exist (4/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe96308841a8 pciexclass,010802, instance #1 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme) # disconnect slot [root@nvme ~]# cfgadm -c disconnect Slot12 # verify only 3 nvme instances exist (3/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme)
(c) Occupant configured -> Receptacle empty (impossible)
This scenario is impossible, as we cannot make the slot empty without physical removal of the drive.
# verify slot status: occupant configured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok # attempt to "remove" occupant [root@nvme ~]# cfgadm -c remove Slot12 cfgadm: Configuration operation invalid: invalid transition Usage: cfgadm [-f] [-y|-n] [-v] [-o hardware_opts ] -c function ap_id [ap_id...] cfgadm [-f] [-y|-n] [-v] [-o hardware_opts ] -x function ap_id [ap_id...] cfgadm [-v] [-s listing_options ] [-o hardware_opts ] [-a] [-l [ap_id|ap_type...]] cfgadm [-v] [-o hardware_opts ] -t ap_id [ap_id...] cfgadm [-v] [-o hardware_opts ] -h [ap_id|ap_type...]
(d) Occupant unconfigured, Receptacle connected -> Occupant configured
This is done via cfgadm -c configure <slot> of an unconfigured slot.
# verify slot status: occupant unconfigured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown connected unconfigured unknown # verify only 3 nvme instances exist (3/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme) # configure slot [root@nvme ~]# cfgadm -c configure Slot12 # verify all 4 instances of nvme driver exist (4/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe9630884468 pciexclass,010802, instance #1 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme)
(e) Receptacle disconnected -> Occupant unconfigured, Receptacle connected
This is done via cfgadm -c connect <slot> of a disconnected slot.
# verify slot status: occupant unconfigured, receptacle disconnected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown disconnected unconfigured unknown # verify only 3 nvme instances exist (3/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme) # connect slot [root@nvme ~]# cfgadm -c connect Slot12 # check Receptacle status is connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown connected unconfigured unknown # verify only 3 nvme instances exist (3/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme)
(f) Receptacle disconnected -> Occupant configured
This is done via cfgadm -c configure <slot> of a disconnected slot.
# verify slot status: occupant unconfigured, receptacle disconnected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown disconnected unconfigured unknown # verify only 3 nvme instances exist (3/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme) # configure slot [root@nvme ~]# cfgadm -c configure Slot12 # verify slot status: occupant configured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok # verify all 4 instances of nvme driver exist (4/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe9630883ee8 pciexclass,010802, instance #1 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme)
2. Coordinated Removal with ongoing I/O (not in a zpool)
In these tests, we show that coordinated removal fails when there is an ongoing dd to the disk being removed.
(a) Occupant unconfigured -> Occupant configured, with ongoing dd
# check slot status [root@nvme /var/tmp/jordan]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok # start dd in another shell [root@nvme ~]# dd if=/dev/urandom of=/dev/rdsk/c3t1d0p0 bs=1M count=10240 67+0 records in 67+0 records out 70254592 bytes transferred in 1.891755 secs (35.4MB/sec) ... # attempt to unconfigure [root@nvme ~]# cfgadm -c unconfigure Slot12 cfgadm: Component system is busy, try again: unconfigure failed # ctrl+C dd process # attempt unconfigure again [root@nvme ~]# cfgadm -c unconfigure Slot12 # unconfigure succeeds; only see 3 nvme instances attached now > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme)(b) Occupant configured -> Receptacle disconnected, with ongoing dd
- check slot status
[root@nvme /var/tmp/jordan]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok # start dd in another shell [root@nvme ~]# dd if=/dev/urandom of=/dev/rdsk/c3t1d0p0 bs=1M count=10240 59+0 records in 59+0 records out 61865984 bytes transferred in 1.781276 secs (33.1MB/sec) ... # attempt to change to disconnected [root@nvme /var/tmp/jordan]# cfgadm -c disconnect Slot12 cfgadm: Component system is busy, try again: disconnect failed
3. Coordinated Removal with drive in a zpool (no ongoing I/O)
Mirrored zpool creation:
[root@nvme ~]# zpool create test mirror c4t1d0 c3t1d0 invalid vdev specification use '-f' to override the following errors: mirror contains devices of different sizes [root@nvme ~]# zpool create -f test mirror c4t1d0 c3t1d0 [root@nvme ~]# zpool status test pool: test state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 errors: No known data errors(a) Mirrored zpool: Occupant configured -> Occupant unconfigured
- verify slot status: occupant configured, receptacle connected
[root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok # check zpool health before unconfigure [root@nvme ~]# zpool status test pool: test state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 errors: No known data errors # do some I/O to zpool before unconfigure [root@nvme /test]# echo bar > bar [root@nvme /test]# cat bar bar unconfigure slot [root@nvme ~]# cfgadm -c unconfigure Slot12 # check zpool health (expect to see c3t1d0 removed) [root@nvme ~]# zpool status test pool: test state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: none requested config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 REMOVED 0 0 0 errors: No known data errors # verify we can do I/O to the degraded mirror zpool [root@nvme /test]# cat bar bar [root@nvme ~]# cd /test [root@nvme /test]# echo foo > foo [root@nvme /test]# cat foo foo # verify slot status has occupant unconfigured, receptacle connected [root@nvme /test]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown connected unconfigured unknown # verify only 3 nvme instances exist (3/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme) # reconfigure slot [root@nvme /test]# cfgadm -c configure Slot12 # check slot status is occupant configured, receptacle connected [root@nvme /test]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok # verify all 4 instances of nvme driver exist (4/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe96308836a8 pciexclass,010802, instance #1 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme) # check zpool health [root@nvme /test]# zpool status test pool: test state: ONLINE scan: resilvered 236K in 0 days 00:00:00 with 0 errors on Mon Aug 26 20:40:17 2019 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 errors: No known data errors # verify we can still do I/O to zpool [root@nvme /test]# echo baz > baz [root@nvme /test]# cat foo foo [root@nvme /test]# cat bar bar [root@nvme /test]# cat baz baz
(b) Mirrored zpool: Occupant configured -> Receptacle disconnected
# verify slot status: occupant configured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok # check zpool health before unconfigure [root@nvme ~]# zpool status test pool: test state: ONLINE scan: resilvered 108K in 0 days 00:00:00 with 0 errors on Mon Aug 26 20:45:50 2019 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 errors: No known data errors # do some I/O to zpool before unconfigure [root@nvme ~]# cd /test [root@nvme /test]# echo foo > foo [root@nvme /test]# cat foo foo # disconnect slot [root@nvme /test]# cfgadm -c disconnect Slot12 # check zpool health (expect to see c3t1d0 removed) [root@nvme /test]# zpool status test pool: test state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 108K in 0 days 00:00:00 with 0 errors on Mon Aug 26 20:45:50 2019 config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 REMOVED 0 0 0 errors: No known data errors # verify we can do I/O to the degraded mirror zpool [root@nvme /test]# cat foo foo [root@nvme /test]# echo bar > bar [root@nvme /test]# cat bar bar # verify slot status has occupant unconfigured, receptacle disconnected [root@nvme /test]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown disconnected unconfigured unknown # verify only 3 nvme instances exist (3/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme) # reconfigure slot [root@nvme /test]# cfgadm -c configure Slot12 # check slot status is occupant configured, receptacle connected [root@nvme /test]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok # verify all 4 instances of nvme driver exist (4/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe96308833e8 pciexclass,010802, instance #1 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme) # check zpool health [root@nvme /test]# zpool status test pool: test state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 108K in 0 days 00:00:00 with 0 errors on Mon Aug 26 20:45:50 2019 config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 REMOVED 0 0 0 errors: No known data errors [root@nvme /test]# zpool online test c3t1d0 [root@nvme /test]# zpool status test pool: test state: ONLINE scan: resilvered 240K in 0 days 00:00:00 with 0 errors on Mon Aug 26 20:50:12 2019 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 errors: No known data errors # verify we can still do I/O to zpool [root@nvme /test]# cat foo foo [root@nvme /test]# cat bar bar [root@nvme /test]# cat baz baz
4. Coordinated Removal with ongoing I/O, drive in a zpool
In this test, I remove a disk with ongoing I/O to the pool that is in a mirrored zpool. I verify that I can check the status of the pull and recover the disk.
# create mirrored pool [root@nvme ~]# zpool create -f test mirror c4t1d0 c3t1d0 # check zpool health [root@nvme ~]# zpool status test pool: test state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 errors: No known data errors # start write to zpool [root@nvme /test]# while true; echo foo >> foo; done # unconfigure slot [root@nvme ~]# cfgadm -c unconfigure Slot12 # check zpool status [root@nvme ~]# zpool status test pool: test state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: none requested config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c3t1d0 REMOVED 0 0 0 c4t1d0 ONLINE 0 0 0 errors: No known data errors # configure slot [root@nvme ~]# cfgadm -c configure Slot12 # check zpool health [root@nvme ~]# zpool status test pool: test state: ONLINE scan: resilvered 0 in 0 days 00:00:00 with 0 errors on Mon Aug 26 22:41:33 2019 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 errors: No known data errors # read file [root@nvme /test]# tail foo foo foo foo foo foo foo foo foo foo foo
Surprise Removal
These tests demonstrate surprise removal for a device with or without ongoing I/O, in or not in a zpool, from several possible Occupant/Receptacle states.
1. Surprise Removal (no ongoing I/O, not in a zpool)
(a) Occupant configured, Receptacle connected
# verify slot status: occupant configured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok # verify all 4 instances of nvme driver exist (4/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe5922a17d50 pciexclass,010802, instance #3 (driver name: nvme) fffffe5922a17aa8 pciexclass,010802, instance #0 (driver name: nvme) fffffe5935347aa8 pciexclass,010802, instance #2 (driver name: nvme) fffffe5922a17558 pciexclass,010802, instance #1 (driver name: nvme) # pull disk; observe notice of slot state change on console 2019-08-27T18:28:07.326676+00:00 nvme pcie: [ID 126225 kern.notice] NOTICE: pciehpc (pcieb4): card is removed from the slot Slot12 # verify slot status: occupant unconfigured, receptacle empty [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown empty unconfigured unknown # verify only 3 nvme instances exist (3/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe5922a17d50 pciexclass,010802, instance #3 (driver name: nvme) fffffe5922a17aa8 pciexclass,010802, instance #0 (driver name: nvme) fffffe5922a17558 pciexclass,010802, instance #1 (driver name: nvme) # put disk back; observe notice of slot state change on console 2019-08-27T18:28:48.823518+00:00 nvme pcie: [ID 661617 kern.notice] NOTICE: pciehpc (pcieb4): card is inserted in the slot Slot12 # configure slot [root@nvme ~]# cfgadm -c configure Slot12 # verify all 4 instances of nvme driver exist (4/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe5922a17d50 pciexclass,010802, instance #3 (driver name: nvme) fffffe5922a17aa8 pciexclass,010802, instance #0 (driver name: nvme) fffffe5935347558 pciexclass,010802, instance #2 (driver name: nvme) fffffe5922a17558 pciexclass,010802, instance #1 (driver name: nvme) # verify slot status: occupant configured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok
(b) Occupant unconfigured, Receptacle connected
# unconfigure configured slot [root@nvme ~]# cfgadm -c unconfigure Slot12 # verify slot status: occupant unconfigured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown connected unconfigured unknown # verify only 3 nvme instances exist (3/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme) # pull disk; observe notice of slot state change on console 2019-08-26T21:33:51.296580+00:00 nvme pcie: [ID 126225 kern.notice] NOTICE: pciehpc (pcieb4): card is removed from the slot Slot12 # verify slot status: occupant unconfigured, receptacle empty [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown empty unconfigured unknown # put disk back; observe notice of slot state change on console 2019-08-26T21:34:13.301708+00:00 nvme pcie: [ID 661617 kern.notice] NOTICE: pciehpc (pcieb4): card is inserted in the slot Slot12 # configure disk [root@nvme ~]# cfgadm -c configure Slot12 # verify slot status: occupant unconfigured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok # verify all 4 instances of nvme driver exist (4/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe96308833e8 pciexclass,010802, instance #1 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme)
(c) Occupant unconfigured, Receptacle disconnected
# disconnect configured slot [root@nvme ~]# cfgadm -c disconnect Slot12 # verify slot status: occupant unconfigured, receptacle disconnected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown disconnected unconfigured unknown # verify only 3 nvme instances exist (3/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme) # pull disk; observe notice of slot state change on console 2019-08-26T21:36:07.755689+00:00 nvme pcie: [ID 126225 kern.notice] NOTICE: pciehpc (pcieb4): card is removed from the slot Slot12 # verify slot status: occupant unconfigured, receptacle empty [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown empty unconfigured unknown # put disk back; observe notice of slot state change on console 2019-08-26T21:36:27.898064+00:00 nvme pcie: [ID 661617 kern.notice] NOTICE: pciehpc (pcieb4): card is inserted in the slot Slot12 # verify slot status: occupant unconfigured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown disconnected unconfigured unknown # configure slot [root@nvme ~]# cfgadm -c configure Slot12 # verify slot status: occupant configured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok # verify all 4 instances of nvme driver exist (4/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffeae3c841470 pciexclass,010802, instance #1 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme)
2. Surprise Removal with drive in a zpool (no ongoing I/O)
In this set of tests we ensure that a drive in a zpool, with no ongoing I/O to the device, will gracefully show up as "removed" in the pool status, as well as successfully come back online when the disk is inserted back into the slot and configured. We try this test for both a mirrored zpool and a non-mirroed zpool, with the occupant and receptacle statuses in each possible state.
(a) Mirrored pool: Occupant configured
# check zpool health [root@nvme ~]# zpool status test pool: test state: ONLINE scan: resilvered 240K in 0 days 00:00:00 with 0 errors on Mon Aug 26 20:50:12 2019 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 errors: No known data errors # verify slot status: occupant configured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok # verify all 4 instances of nvme driver exist (4/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe96308833e8 pciexclass,010802, instance #1 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme) # perform I/O to pull prior to pull [root@nvme /test]# ls bar baz foo [root@nvme /test]# cat bar baz foo bar baz foo [root@nvme /test]# echo blah > blah [root@nvme /test]# cat blah blah # pull disk; observe notice of slot state change on console [root@nvme ~]# 2019-08-26T21:11:44.102510+00:00 nvme pcie: [ID 126225 kern.notice] NOTICE: pciehpc (pcieb4): card is removed from the slot Slot12 # check zpool health (expect c3t1d0 to be "removed") [root@nvme /test]# zpool status test pool: test state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 240K in 0 days 00:00:00 with 0 errors on Mon Aug 26 20:50:12 2019 config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 REMOVED 0 0 0 errors: No known data errors # verify slot status: occupant unconfigured, receptacle empty [root@nvme /test]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown empty unconfigured unknown # verify only 3 nvme instances exist (3/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme) # verify we can still perform I/O to zpool [root@nvme /test]# ls bar baz blah foo [root@nvme /test]# cat bar baz blah foo bar baz blah foo [root@nvme /test]# echo blah1 > lbah1 [root@nvme /test]# cat lbah1 blah1 # put disk back; observe notice of slot state change on console 2019-08-26T21:14:01.686033+00:00 nvme pcie: [ID 661617 kern.notice] NOTICE: pciehpc (pcieb4): card is inserted in the slot Slot12 # verify new slot status: occupant unconfigured, receptacle disconnected [root@nvme /test]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown disconnected unconfigured unknown # configure slot [root@nvme /test]# cfgadm -c configure Slot12 # check zpool health [root@nvme /test]# zpool status test pool: test state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 240K in 0 days 00:00:00 with 0 errors on Mon Aug 26 20:50:12 2019 config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 REMOVED 0 0 0 errors: No known data errors [root@nvme /test]# zpool online test c3t1d0 [root@nvme /test]# zpool status test pool: test state: ONLINE scan: resilvered 240K in 0 days 00:00:00 with 0 errors on Mon Aug 26 21:15:10 2019 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 # verify we can still perform I/O to zpool [root@nvme /test]# ls bar baz blah foo lbah1 [root@nvme /test]# cat bar baz blah foo lbah1 bar baz blah foo blah1 [root@nvme /test]# echo blah2 > blah2 [root@nvme /test]# cat blah2 blah2
(b) Mirrored pool: Occupant unconfigured, Receptacle connected
# unconfigure slot [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok [root@nvme ~]# cfgadm -c unconfigure Slot12 # verify slot status: occupant unconfigured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown connected unconfigured unknown # check zpool health (expect c1t3d1 to already be "removed" since it was unconfigured) [root@nvme ~]# zpool status test pool: test state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 240K in 0 days 00:00:00 with 0 errors on Mon Aug 26 21:15:10 2019 config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 REMOVED 0 0 0 errors: No known data errors # perform I/O to pull prior to pull [root@nvme /test]# ls bar baz blah blah2 foo lbah1 [root@nvme /test]# cat bar baz bar baz [root@nvme /test]# echo blah3 > blah3 [root@nvme /test]# cat blah3 blah3 # pull disk; observe notice of slot state change on console 2019-08-26T21:19:11.646796+00:00 nvme pcie: [ID 126225 kern.notice] NOTICE: pciehpc (pcieb4): card is removed from the slot Slot12 # verify slot status: occupant unconfigured, receptacle empty [root@nvme /test]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown empty unconfigured unknown # check zpool health (expect c3t1d0 to be "removed") [root@nvme /test]# zpool status test pool: test state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 240K in 0 days 00:00:00 with 0 errors on Mon Aug 26 21:15:10 2019 config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 REMOVED 0 0 0 errors: No known data errors # verify only 3 nvme instances exist (3/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme) # verify we can still perform I/O to zpool [root@nvme /test]# cat blah3 blah2 blah3 blah2 [root@nvme /test]# echo blah4 > blah4 [root@nvme /test]# cat blah4 blah4 # put disk back; observe notice of slot state change on console 2019-08-26T21:20:35.633498+00:00 nvme pcie: [ID 661617 kern.notice] NOTICE: pciehpc (pcieb4): card is inserted in the slot Slot12 # verify new slot status: occupant unconfigured, receptacle disconnected [root@nvme /test]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown disconnected unconfigured unknown # configure slot [root@nvme /test]# cfgadm -c configure Slot12 # check zpool health [root@nvme /test]# zpool status test pool: test state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 240K in 0 days 00:00:00 with 0 errors on Mon Aug 26 21:15:10 2019 config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 REMOVED 0 0 0 errors: No known data errors [root@nvme /test]# zpool online test c3t1d0 [root@nvme /test]# zpool status test pool: test state: ONLINE scan: resilvered 296K in 0 days 00:00:00 with 0 errors on Mon Aug 26 21:21:30 2019 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 errors: No known data errors # verify we can still perform I/O to zpool [root@nvme /test]# cat blah2 blah3 blah4 blah2 blah3 blah4 [root@nvme /test]# echo blah5 > blah5 [root@nvme /test]# cat blah5 blah5
(c) Mirrored pool: Occupant unconfigured, Receptacle disconnected
# disconnect slot [root@nvme /test]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok [root@nvme /test]# cfgadm -c disconnect Slot12 # verify slot status: occupant unconfigured, receptacle disconnected [root@nvme /test]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown disconnected unconfigured unknown # check zpool health (expect c1t3d1 to already be "removed" since it was disconnected) [root@nvme /test]# zpool status test pool: test state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 296K in 0 days 00:00:00 with 0 errors on Mon Aug 26 21:21:30 2019 config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 REMOVED 0 0 0 errors: No known data errors # perform I/O to pull prior to pull [root@nvme /test]# cat blah2 blah5 blah2 blah5 [root@nvme /test]# echo blah6 > blah6 [root@nvme /test]# cat blah6 blah6 # pull disk; observe notice of slot state change on console 2019-08-26T21:24:23.771910+00:00 nvme pcie: [ID 126225 kern.notice] NOTICE: pciehpc (pcieb4): card is removed from the slot Slot12 # verify slot status: occupant unconfigured, receptacle empty [root@nvme /test]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown empty unconfigured unknown # check zpool health (expect c3t1d0 to be "removed") [root@nvme /test]# zpool status test pool: test state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 296K in 0 days 00:00:00 with 0 errors on Mon Aug 26 21:21:30 2019 config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 REMOVED 0 0 0 errors: No known data errors # verify only 3 nvme instances exist (3/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme) # verify we can still perform I/O to zpool [root@nvme /test]# ls bar baz blah blah2 blah3 blah4 blah5 blah6 foo lbah1 [root@nvme /test]# cat blah5 blah6 blah5 blah6 [root@nvme /test]# echo blah7 > blah7 [root@nvme /test]# cat blah7 blah7 # put disk back; observe notice of slot state change on console 2019-08-26T21:25:35.761218+00:00 nvme pcie: [ID 661617 kern.notice] NOTICE: pciehpc (pcieb4): card is inserted in the slot Slot12 # verify new slot status: occupant unconfigured, receptacle disconnected [root@nvme /test]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown disconnected unconfigured unknown # configure slot [root@nvme /test]# cfgadm -c configure Slot12 # check zpool health [root@nvme /test]# zpool status test pool: test state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 296K in 0 days 00:00:00 with 0 errors on Mon Aug 26 21:21:30 2019 config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 REMOVED 0 0 0 errors: No known data errors [root@nvme /test]# zpool online test c3t1d0 [root@nvme /test]# zpool status test pool: test state: ONLINE scan: resilvered 348K in 0 days 00:00:00 with 0 errors on Mon Aug 26 21:26:24 2019 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 errors: No known data errors # verify we can still perform I/O to zpool [root@nvme /test]# ls bar baz blah blah2 blah3 blah4 blah5 blah6 blah7 foo lbah1 [root@nvme /test]# cat blah6 blah7 blah6 blah7 [root@nvme /test]# echo blah8 > blah8 [root@nvme /test]# cat blah8 blah8
(d) Non-mirrored pool: Occupant configured
# create zpool [root@nvme ~]# zpool create test c3t1d0 c4t1d0 # check zpool health [root@nvme ~]# zpool status test pool: test state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 errors: No known data errors # perform I/O to pull prior to pull [root@nvme ~]# cd /test [root@nvme /test]# ls [root@nvme /test]# echo foo > foo [root@nvme /test]# cat foo foo # pull disk; observe notice of slot state change on console 2019-08-26T22:14:56.952318+00:00 nvme pcie: [ID 126225 kern.notice] NOTICE: pciehpc (pcieb4): card is removed from the slot Slot12 # check zpool health # (I/O to zpool will hang, so don't try that) [root@nvme /test]# zpool status test pool: test state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://illumos.org/msg/ZFS-8000-HC scan: none requested config: NAME STATE READ WRITE CKSUM test UNAVAIL 0 0 0 insufficient replicas c3t1d0 REMOVED 0 0 0 c4t1d0 ONLINE 0 0 0 errors: No known data errors # put disk back; observe notice of slot state change on console 2019-08-26T22:19:08.036759+00:00 nvme pcie: [ID 661617 kern.notice] NOTICE: pciehpc (pcieb4): card is inserted in the slot Slot12 # configure disk [root@nvme /test]# cfgadm -c configure Slot12 # clear pool and check health [root@nvme /test]# zpool clear test [root@nvme /test]# zpool status test pool: test state: ONLINE scan: scrub repaired 0 in 0 days 00:00:00 with 0 errors on Mon Aug 26 22:20:00 2019 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 errors: No known data errors
3. Surprise Removal with ongoing I/O (not in a zpool)
For this test, we run dd(1M) on a drive that we then pull with I/O ongoing. We expect dd to fail with EIO. There is an outstanding issue that results in the driver for the device not always detaching. See OS-7956 for details.
# verify slot status: occupant configured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok # verify all 4 instances of nvme driver exist (4/4 disks on system) > ::prtconf -d nvme DEVINFO NAME fffffe59b8a94360 pciexclass,010802, instance #3 (driver name: nvme) fffffe59b8a940a0 pciexclass,010802, instance #0 (driver name: nvme) fffffeae3c841470 pciexclass,010802, instance #1 (driver name: nvme) fffffe59b8a93b20 pciexclass,010802, instance #2 (driver name: nvme) # in another shell, start dd to disk [root@nvme ~]# dd if=/dev/urandom of=/dev/rdsk/c3t1d0p0 bs=1M count=10240 21+0 records in 21+0 records out 22020096 bytes transferred in 0.631863 secs (33.2MB/sec) ... # pull disk; observe notice of slot state change on console 2019-08-26T21:54:26.219402+00:00 nvme pcie: [ID 126225 kern.notice] NOTICE: pciehpc (pcieb4): card is removed from the slot Slot12 # check that dd exited cleanly [root@nvme ~]# dd if=/dev/urandom of=/dev/rdsk/c3t1d0p0 bs=1M count=10240 21+0 records in 21+0 records out 22020096 bytes transferred in 0.631863 secs (33.2MB/sec) write: I/O error 412+0 records in 412+0 records out 430964736 bytes transferred in 11.598755 secs (35.4MB/sec)
4. Surprise Removal with drive in a zpool and ongoing I/O
This setup may panic the system due to an existing bug. See OS-2743 for more details.
ereports and sysevents Generated From Hotplug
These tests demonstrate the ereports and sysevents generated from coordinated removal and surprise removal of a device.
These tests are important because some of this work changes the way fabric-xlate generates ereports for surprise removal. Additionally, the DDI hotplug framework generates sysevents for all state changes; verifying these sysevents are generated serves as a regression test.
1. Coordinated Removal using cfgadm(1M)
In the case of coordinated removal, we expect to see sysevents generated from the hotplug framework, but not ereports.
(a) Occupant configured -> Occupant unconfigured
# check slot status; unconfigure configured Slot [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok [root@nvme ~]# cfgadm -c unconfigure Slot12 # in another shell, watch for state change related sysevents from DDI HP framework [root@nvme ~]# sysevent EC_dr ESC_dr_ap_state_change nvlist version: 0 date = 2019-08-26T18:56:38.312Z type = event vendor = SUNW publisher = pcieb class = EC_dr subclass = ESC_dr_ap_state_change pid = 0 data = (embedded nvlist) nvlist version: 0 dr_hint = dr_ap_id = /devices/pci@0,0/pci8086,6f0a@3,2:Slot12 (end data)
(b) Occupant configured -> Receptacle disconnected
# check slot status; disconnect configured Slot [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok [root@nvme ~]# cfgadm -c disconnect Slot12 # in another shell, watch for state change related sysevents from DDI HP framework [root@nvme ~]# sysevent EC_dr ESC_dr_ap_state_change nvlist version: 0 date = 2019-08-26T18:59:05.059Z type = event vendor = SUNW publisher = pcieb class = EC_dr subclass = ESC_dr_ap_state_change pid = 0 data = (embedded nvlist) nvlist version: 0 dr_hint = dr_ap_id = /devices/pci@0,0/pci8086,6f0a@3,2:Slot12 (end data)
(c) Occupant unconfigured -> Occupant configured
# check slot status; configure unconfigured Slot [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown connected unconfigured unknown [root@nvme ~]# cfgadm -c configure Slot12 # in another shell, watch for state change related sysevents from DDI HP framework [root@nvme ~]# sysevent EC_dr ESC_dr_ap_state_change nvlist version: 0 date = 2019-08-26T18:53:53.181Z type = event vendor = SUNW publisher = pcieb class = EC_dr subclass = ESC_dr_ap_state_change pid = 0 data = (embedded nvlist) nvlist version: 0 dr_hint = dr_ap_id = /devices/pci@0,0/pci8086,6f0a@3,2:Slot12 (end data)
(d) Receptacle disconnected -> Occupant configured
# check slot status; configure disconnected Slot [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown disconnected unconfigured unknown [root@nvme ~]# cfgadm -c configure Slot12 # in another shell, watch for state change related sysevents from DDI HP framework [root@nvme ~]# sysevent EC_dr ESC_dr_ap_state_change nvlist version: 0 date = 2019-08-26T19:00:52.730Z type = event vendor = SUNW publisher = pcieb class = EC_dr subclass = ESC_dr_ap_state_change pid = 0 data = (embedded nvlist) nvlist version: 0 dr_hint = dr_ap_id = /devices/pci@0,0/pci8086,6f0a@3,2:Slot12 (end data)
(e) Receptacle disconnected -> Receptacle connected
# check slot status; connect disconnected Slot [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown disconnected unconfigured unknown [root@nvme ~]# cfgadm -c connect Slot12 # in another shell, watch for state change related sysevents from DDI HP framework [root@nvme ~]# sysevent EC_dr ESC_dr_ap_state_change nvlist version: 0 date = 2019-08-26T19:02:54.696Z type = event vendor = SUNW publisher = pcieb class = EC_dr subclass = ESC_dr_ap_state_change pid = 0 data = (embedded nvlist) nvlist version: 0 dr_hint = dr_ap_id = /devices/pci@0,0/pci8086,6f0a@3,2:Slot12 (end data)
2. Surprise Removal
For surprise removal, we expect to see both a sysevent from the DDI HP framework reflecting the connection state changed and an ereport from the PCI scan fabric kernel code (but none from fx-translate or eft).
The tests below represent a drive not in a zpool, with no ongoing I/O. Note that there are now registers for the parent slot registers (pcie_slot_*).
(a) Occupant configured
# verify slot status: occupant configured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 nvme/hp connected configured ok # pull disk; observe notice of slot state change on console 2019-08-26T22:07:19.087107+00:00 nvme pcie: [ID 126225 kern.notice] NOTICE: pciehpc (pcieb4): card is removed from the slot Slot12 # in another shell, watch for state change related sysevents from DDI HP framework [root@nvme ~]# sysevent EC_dr ESC_dr_ap_state_change nvlist version: 0 date = 2019-08-26T22:07:40.093Z type = event vendor = SUNW publisher = pcieb class = EC_dr subclass = ESC_dr_ap_state_change pid = 0 data = (embedded nvlist) nvlist version: 0 dr_hint = dr_ap_id = /devices/pci@0,0/pci8086,6f0a@3,2:Slot12 (end data) # observe fabric ereport using fmdump -eV Aug 26 2019 22:07:19.087794255 ereport.io.pci.fabric nvlist version: 0 class = ereport.io.pci.fabric ena = 0x453b1c8d07b00001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /pci@0,0/pci8086,6f0a@3,2 (end detector) bdf = 0x1a device_id = 0x6f0a vendor_id = 0x8086 rev_id = 0x1 dev_type = 0x40 pcie_off = 0x90 pcix_off = 0x0 aer_off = 0x148 ecc_ver = 0x0 pci_status = 0x10 pci_command = 0x47 pci_bdg_sec_status = 0x0 pci_bdg_ctrl = 0x3 pcie_status = 0x0 pcie_command = 0x27 pcie_dev_cap = 0x8001 pcie_adv_ctl = 0xae pcie_ue_status = 0x0 pcie_ue_mask = 0x180000 pcie_ue_sev = 0x62030 pcie_ue_hdr0 = 0x0 pcie_ue_hdr1 = 0x0 pcie_ue_hdr2 = 0x0 pcie_ue_hdr3 = 0x0 pcie_ce_status = 0x0 pcie_ce_mask = 0x0 pcie_rp_status = 0x0 pcie_rp_control = 0x0 pcie_adv_rp_status = 0x0 pcie_adv_rp_command = 0x7 pcie_adv_rp_ce_src_id = 0x600 pcie_adv_rp_ue_src_id = 0x1a pcie_slot_cap = 0x48007b pcie_slot_control = 0x11fd pcie_slot_status = 0x0 remainder = 0x0 severity = 0x1 __ttl = 0x1 __tod = 0x5d645817 0x53ba24f
(b) Occupant unconfigured, Receptacle connected
# verify slot status: occupant unconfigured, receptacle connected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown connected unconfigured unknown # pull disk; observe notice of slot state change on console 2019-08-26T22:10:52.478810+00:00 nvme pcie: [ID 126225 kern.notice] NOTICE: pciehpc (pcieb4): card is removed from the slot Slot12 # in another shell, watch for state change related sysevents from DDI HP framework [root@nvme ~]# sysevent EC_dr ESC_dr_ap_state_change nvlist version: 0 date = 2019-08-26T22:10:52.479Z type = event vendor = SUNW publisher = pcieb class = EC_dr subclass = ESC_dr_ap_state_change pid = 0 data = (embedded nvlist) nvlist version: 0 dr_hint = dr_ap_id = /devices/pci@0,0/pci8086,6f0a@3,2:Slot12 (end data) # observe fabric ereport using fmdump -eV Aug 26 2019 22:10:52.478818747 ereport.io.pci.fabric nvlist version: 0 class = ereport.io.pci.fabric ena = 0x485611c2dff02801 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /pci@0,0/pci8086,6f0a@3,2 (end detector) bdf = 0x1a device_id = 0x6f0a vendor_id = 0x8086 rev_id = 0x1 dev_type = 0x40 pcie_off = 0x90 pcix_off = 0x0 aer_off = 0x148 ecc_ver = 0x0 pci_status = 0x10 pci_command = 0x47 pci_bdg_sec_status = 0x0 pci_bdg_ctrl = 0x3 pcie_status = 0x0 pcie_command = 0x20 pcie_dev_cap = 0x8001 pcie_adv_ctl = 0xae pcie_ue_status = 0x0 pcie_ue_mask = 0x1ff030 pcie_ue_sev = 0x62030 pcie_ue_hdr0 = 0x0 pcie_ue_hdr1 = 0x0 pcie_ue_hdr2 = 0x0 pcie_ue_hdr3 = 0x0 pcie_ce_status = 0x0 pcie_ce_mask = 0x11c1 pcie_rp_status = 0x0 pcie_rp_control = 0x0 pcie_adv_rp_status = 0x0 pcie_adv_rp_command = 0x7 pcie_adv_rp_ce_src_id = 0x600 pcie_adv_rp_ue_src_id = 0x1a pcie_slot_cap = 0x48007b pcie_slot_control = 0x11fd pcie_slot_status = 0x0 remainder = 0x0 severity = 0x1 __ttl = 0x1 __tod = 0x5d6458ec 0x1c8a31bb
(c) Receptacle disconnected
# verify slot status: occupant unconfigured, receptacle disconnected [root@nvme ~]# cfgadm Slot12 Ap_Id Type Receptacle Occupant Condition Slot12 unknown disconnected unconfigured unknown # pull disk; observe notice of slot state change on console 2019-08-26T22:12:32.876051+00:00 nvme pcie: [ID 126225 kern.notice] NOTICE: pciehpc (pcieb4): card is removed from the slot Slot12 # in another shell, watch for state change related sysevents from DDI HP framework nvlist version: 0 date = 2019-08-26T22:12:32.876Z type = event vendor = SUNW publisher = pcieb class = EC_dr subclass = ESC_dr_ap_state_change pid = 0 data = (embedded nvlist) nvlist version: 0 dr_hint = dr_ap_id = /devices/pci@0,0/pci8086,6f0a@3,2:Slot12 (end data) # observe fabric ereport using fmdump -eV Aug 26 2019 22:12:54.189650824 ereport.io.pci.fabric nvlist version: 0 class = ereport.io.pci.fabric ena = 0x4a1b7c2dd8102801 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /pci@0,0/pci8086,6f0a@3,2 (end detector) bdf = 0x1a device_id = 0x6f0a vendor_id = 0x8086 rev_id = 0x1 dev_type = 0x40 pcie_off = 0x90 pcix_off = 0x0 aer_off = 0x148 ecc_ver = 0x0 pci_status = 0x10 pci_command = 0x47 pci_bdg_sec_status = 0x0 pci_bdg_ctrl = 0x3 pcie_status = 0x0 pcie_command = 0x20 pcie_dev_cap = 0x8001 pcie_adv_ctl = 0xae pcie_ue_status = 0x0 pcie_ue_mask = 0x1ff030 pcie_ue_sev = 0x62030 pcie_ue_hdr0 = 0x0 pcie_ue_hdr1 = 0x0 pcie_ue_hdr2 = 0x0 pcie_ue_hdr3 = 0x0 pcie_ce_status = 0x0 pcie_ce_mask = 0x11c1 pcie_rp_status = 0x0 pcie_rp_control = 0x0 pcie_adv_rp_status = 0x0 pcie_adv_rp_command = 0x7 pcie_adv_rp_ce_src_id = 0x600 pcie_adv_rp_ue_src_id = 0x1a pcie_slot_cap = 0x48007b pcie_slot_control = 0x17fd pcie_slot_status = 0x40 remainder = 0x0 severity = 0x1 __ttl = 0x1 __tod = 0x5d645966 0xb4dd788
Updated by Youzhong Yang about 1 year ago
I think this still needs more work.
We have the following eval hardware from ThinkMate:
Supermicro 1029U-TRTP with AOC-SLG3-4X4P (PCI-E x16 buffer expansion card) - 2 nodes.
Supermicro SSG-136R-N32JBF JBOF system with 14 NVMe drives.
9 NVMe drives on Smartos (batfs2099), 5 drives on Linux (batfs2098):
[root@batfs2099 ~]# cfgadm -l Ap_Id Type Receptacle Occupant Condition Slot20 pci-pci/hp connected configured ok c2 scsi-bus connected configured unknown pcieb3.pcie0 unknown empty unconfigured unknown pcieb4.pcie0 unknown empty unconfigured unknown pcieb5.pcie0 unknown empty unconfigured unknown pcieb6.pcie0 unknown empty unconfigured unknown pcieb8.pcie0 unknown empty unconfigured unknown pcieb9.pcie0 unknown empty unconfigured unknown sata1/0 sata-port empty unconfigured ok sata1/1 sata-port empty unconfigured ok sata1/2 sata-port empty unconfigured ok sata1/3 sata-port empty unconfigured ok sata1/4 sata-port empty unconfigured ok sata1/5 sata-port empty unconfigured ok sata2/0 sata-port empty unconfigured ok sata2/1 sata-port empty unconfigured ok sata2/2 sata-port empty unconfigured ok sata2/3 sata-port empty unconfigured ok sata2/4 sata-port empty unconfigured ok sata2/5 sata-port empty unconfigured ok sata2/6 sata-port empty unconfigured ok sata2/7 sata-port empty unconfigured ok usb1/1 unknown empty unconfigured ok usb1/2 unknown empty unconfigured ok usb1/3 unknown empty unconfigured ok usb1/4 unknown empty unconfigured ok usb1/5 unknown empty unconfigured ok usb1/6 unknown empty unconfigured ok usb1/7 usb-hub connected configured ok usb1/7.1 usb-device connected configured ok usb1/7.2 unknown empty unconfigured ok usb1/7.3 unknown empty unconfigured ok usb1/7.4 unknown empty unconfigured ok usb1/8 unknown empty unconfigured ok usb1/9 unknown empty unconfigured ok usb1/10 unknown empty unconfigured ok usb1/11 unknown empty unconfigured ok usb1/12 unknown empty unconfigured ok usb1/13 usb-storage connected configured ok usb1/14 unknown empty unconfigured ok usb1/15 unknown empty unconfigured ok usb1/16 unknown empty unconfigured ok usb1/17 unknown empty unconfigured ok usb1/18 unknown empty unconfigured ok usb1/19 unknown empty unconfigured ok usb1/20 unknown empty unconfigured ok usb1/21 unknown empty unconfigured ok usb1/22 unknown empty unconfigured ok usb1/23 unknown empty unconfigured ok usb1/24 unknown empty unconfigured ok usb1/25 unknown empty unconfigured ok usb1/26 unknown empty unconfigured ok [root@batfs2099 ~]# diskinfo TYPE DISK VID PID SIZE RMV SSD SCSI c1t0d0 InnoDisk USB Drive 3.87 GiB yes no SCSI c2t0d1 ATA Micron_5200_MTFD 1788.50 GiB no yes SCSI c2t1d1 ATA Micron_5200_MTFD 1788.50 GiB no yes SCSI c2t2d1 ATA Micron_5200_MTFD 1788.50 GiB no yes NVME c3t5CD2E4E7B0080100d0 INTEL SSDPE2KE032T8 2980.82 GiB no yes NVME c4t5CD2E47CA3080100d0 INTEL SSDPE2KE032T8 2980.82 GiB no yes NVME c5t5CD2E487B0080100d0 INTEL SSDPE2KE032T8 2980.82 GiB no yes NVME c6t5CD2E454B0080100d0 INTEL SSDPE2KE032T8 2980.82 GiB no yes NVME c7t5CD2E402A4080100d0 INTEL SSDPE2KE032T8 2980.82 GiB no yes NVME c8t5CD2E48AB0080100d0 INTEL SSDPE2KE032T8 2980.82 GiB no yes NVME c9t5CD2E447B0080100d0 INTEL SSDPE2KE032T8 2980.82 GiB no yes NVME c10t5CD2E4B2B0080100d0 INTEL SSDPE2KE032T8 2980.82 GiB no yes NVME c11t5CD2E459B0080100d0 INTEL SSDPE2KE032T8 2980.82 GiB no yes [root@batfs2099 ~]# uname -a SunOS batfs2099 5.11 joyent_20191107T010753Z i86pc i386 i86pc root@batfs2098:~# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 89.1M 1 loop /snap/core/7917 loop1 7:1 0 54.7M 1 loop /snap/lxd/12211 loop2 7:2 0 4.7M 1 loop /snap/network-manager/379 sda 8:0 1 3.9G 0 disk ââsda1 8:1 1 256M 0 part ââsda2 8:2 1 1M 0 part ââsda3 8:3 1 1.6G 0 part ââsda9 8:9 1 8M 0 part sdb 8:16 0 1.8T 0 disk ââsdb1 8:17 0 1.8T 0 part ââsdb9 8:25 0 8M 0 part sdc 8:32 0 1.8T 0 disk ââsdc1 8:33 0 1.8T 0 part ââsdc9 8:41 0 8M 0 part sdd 8:48 0 1.8T 0 disk ââsdd1 8:49 0 1M 0 part ââsdd2 8:50 0 1.8T 0 part / sr0 11:0 1 843M 0 rom nvme0n1 259:0 0 2.9T 0 disk nvme1n1 259:1 0 2.9T 0 disk nvme2n1 259:2 0 2.9T 0 disk nvme3n1 259:3 0 2.9T 0 disk nvme4n1 259:4 0 2.9T 0 disk
Assign a drive to Linux (batfs2098), it should appear on the Linux box and disappear from the SmartOS box.
root@batfs2098:~# LC_ALL=C lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 89.1M 1 loop /snap/core/7917 loop1 7:1 0 54.7M 1 loop /snap/lxd/12211 loop2 7:2 0 4.7M 1 loop /snap/network-manager/379 sda 8:0 1 3.9G 0 disk |-sda1 8:1 1 256M 0 part |-sda2 8:2 1 1M 0 part |-sda3 8:3 1 1.6G 0 part `-sda9 8:9 1 8M 0 part sdb 8:16 0 1.8T 0 disk |-sdb1 8:17 0 1.8T 0 part `-sdb9 8:25 0 8M 0 part sdc 8:32 0 1.8T 0 disk |-sdc1 8:33 0 1.8T 0 part `-sdc9 8:41 0 8M 0 part sdd 8:48 0 1.8T 0 disk |-sdd1 8:49 0 1M 0 part `-sdd2 8:50 0 1.8T 0 part / sr0 11:0 1 843M 0 rom nvme0n1 259:0 0 2.9T 0 disk nvme1n1 259:1 0 2.9T 0 disk nvme2n1 259:2 0 2.9T 0 disk nvme3n1 259:3 0 2.9T 0 disk nvme4n1 259:4 0 2.9T 0 disk nvme5n1 259:5 0 2.9T 0 disk
diskinfo on smartos hung:
[root@batfs2099 ~]# diskinfo TYPE DISK VID PID SIZE RMV SSD [root@batfs2099 ~]# mdb -ke '0t8582::pid2proc | ::walk thread | ::findstack -v' stack pointer for thread fffffd666c126400 (diskinfo/1): fffff94600ccb650 [ fffff94600ccb650 _resume_from_idle+0x12b() ] fffff94600ccb680 swtch+0x133() fffff94600ccb6c0 sema_p+0x1cf(fffffd665c22d700) fffff94600ccb700 biowait+0xa4(fffffd665c22d640) fffff94600ccb790 bd_tg_rdwr+0xfd(fffffd665fe2dab0, 0, fffffd6663b78000, 0, 200, 0) fffff94600ccb850 cmlb_use_efi+0xc9(fffffd704becb000, 1749a42b0, 0, 0) fffff94600ccb8d0 cmlb_validate_geometry+0xb4(fffffd704becb000, 1, 0, 0) fffff94600ccb930 cmlb_validate+0x68(fffffd704becb000, 0, 0) fffff94600ccb9d0 bd_open+0xcb(fffff94600ccba68, 5, 2, fffffd666c095710) fffff94600ccba00 dev_open+0x26(fffff94600ccba68, 5, 2, fffffd666c095710) fffff94600ccbad0 spec_open+0x1ab(fffff94600ccbbc0, 5, fffffd666c095710, 0) fffff94600ccbb40 fop_open+0x96(fffff94600ccbbc0, 5, fffffd666c095710, 0) fffff94600ccbce0 vn_openat+0x1d3(8172f78, 0, 5, 4, fffff94600ccbd40, 0, fffff94600000012, 0, fffff94600000003) fffff94600ccbe50 copen+0x449(ffd19553, 8172f78, 5, 4) fffff94600ccbe80 openat32+0x1a(ffd19553, 8172f78, 4, 4) fffff94600ccbeb0 open32+0x1c(8172f78, 4, 4) fffff94600ccbf10 _sys_sysenter_post_swapgs+0x159() stack pointer for thread fffffd665bc25780 (diskinfo/2): fffff94601d95d40 [ fffff94601d95d40 _resume_from_idle+0x12b() ] fffff94601d95d70 swtch+0x133() fffff94601d95da0 shuttle_swtch+0x21e(ffffffffc0008fb0) fffff94601d95e40 door_return+0x278(0, 0, 0, 0, fecdee00, f5f00) fffff94601d95eb0 doorfs32+0x184(0, 0, 0, fecdee00, f5f00, a) fffff94601d95f10 _sys_sysenter_post_swapgs+0x159() stack pointer for thread fffffd666395b080 (diskinfo/3): fffff9460115cc70 [ fffff9460115cc70 _resume_from_idle+0x12b() ] fffff9460115cca0 swtch+0x133() fffff9460115cd20 cv_wait_sig_swap_core+0x18d(fffffd666395b26e, fffffd666395b270, 0) fffff9460115cd40 cv_wait_sig_swap+0xd(fffffd666395b26e, fffffd666395b270) fffff9460115cdc0 cv_waituntil_sig+0xd5(fffffd666395b26e, fffffd666395b270, 0, 0) fffff9460115ce60 lwp_park+0x136(0, 0) fffff9460115ceb0 syslwp_park+0x3f(0, 0, 0) fffff9460115cf10 _sys_sysenter_post_swapgs+0x159()
Also after assigning a drive from Linux box to SmartOS box, that drive disappeared from Linux but never appeared on SmartOS.