Bug #2608
openpowerfail on e-sata device blocks zfs
0%
Description
second time we had a powerfail, guess it's time to report this...
everything is on a UPS except an e-sata disk (icy box adapter) on a workstation running OpenIndiana Build oi_151a2 64-bit (illumos fc320b2833d3)
when the powerfail hits, only the e-sata disk loses power then comes back up.
zpool list or zpool status commands hang... cannot seem to be able to put back online before rebooting, perhaps I'm missing the magic incantation. there should be a better way to get the device back online, isn't there?
extrait from /var/adm/messages:
Apr 10 09:25:43 x3200 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci1025,157@9/disk@3,0 (sd6): Apr 10 09:25:43 x3200 Command failed to complete...Device is gone Apr 10 09:43:34 x3200 sata: [ID 801593 kern.warning] WARNING: /pci@0,0/pci1025,157@9: Apr 10 09:43:34 x3200 SATA device detached at port 3 Apr 10 09:43:34 x3200 sata: [ID 801593 kern.warning] WARNING: /pci@0,0/pci1025,157@9: Apr 10 09:43:34 x3200 SATA device detached at port 3 Apr 10 09:43:44 x3200 sata: [ID 801593 kern.warning] WARNING: /pci@0,0/pci1025,157@9: Apr 10 09:43:44 x3200 SATA device detected at port 3 Apr 10 09:43:44 x3200 sata: [ID 663010 kern.info] /pci@0,0/pci1025,157@9 : Apr 10 09:43:44 x3200 sata: [ID 761595 kern.info] SATA disk device at port 3 Apr 10 09:43:44 x3200 sata: [ID 846691 kern.info] model WDC WD3200AAJS-22B4A0 Apr 10 09:43:44 x3200 sata: [ID 693010 kern.info] firmware 01.03A01 Apr 10 09:43:44 x3200 sata: [ID 163988 kern.info] serial number WD-WCAT13982913 Apr 10 09:43:44 x3200 sata: [ID 594940 kern.info] supported features: Apr 10 09:43:44 x3200 sata: [ID 981177 kern.info] 48-bit LBA, DMA, Native Command Queueing, SMART, SMART self-test Apr 10 09:43:44 x3200 sata: [ID 643337 kern.info] SATA Gen2 signaling speed (3.0Gbps) Apr 10 09:43:44 x3200 sata: [ID 349649 kern.info] Supported queue depth 32 Apr 10 09:43:44 x3200 sata: [ID 349649 kern.info] capacity = 625142448 sectors Apr 10 09:43:44 x3200 sata: [ID 801593 kern.warning] WARNING: /pci@0,0/pci1025,157@9: Apr 10 09:43:44 x3200 Application(s) accessing previously attached SATA device have to release it before newly inserted device can be made accessible.
Updated by Richard PALO about 11 years ago
Now I'm a bit up a stream...
Thought I'd try to reproduce by simply turning off then back on my device.
Now I get
$ sudo zpool status -vx apool pool : apool état : UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. reportez-vous au site : http://www.sun.com/msg/ZFS-8000-HC scan: scrub repaired 0 in 2h26m with 0 errors on Sat Oct 8 12:41:16 2011 configuration : NAME STATE READ WRITE CKSUM apool UNAVAIL 0 0 0 répliques insuffisantes c5t3d0s0 UNAVAIL 0 0 0 ouverture impossible errors: Permanent errors have been detected in the following files: <metadata>:<0x0> <metadata>:<0x1> <metadata>:<0x4a> apool:<0x3> $ zfs list apool cannot open 'apool': pool I/O is currently suspended $ sudo zpool clear apool cannot clear errors for apool: Erreur E/S $ sudo zpool clear apool c5t3d0s0 cannot clear errors for c5t3d0s0: Erreur E/S
gulp, help?
Updated by Richard PALO about 11 years ago
Tried simply rebooting, and it was back in order. Same problem then, just different symptoms.
On thing I didn't mention above in the recreation, is that I noticed that the device comes back up unconfigured, after power off and then back on, needing to be manually configured with cfgadm (like any time when I boot without the device powered on : #cfgadm -c configure sata0/3)
scrubbing now to make sure, but I presume there is nothing wrong as was the case with a real powerfail.