Project

General

Profile

Actions

Bug #2608

open

powerfail on e-sata device blocks zfs

Added by Richard PALO about 11 years ago. Updated about 11 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2012-04-10
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:
External Bug:

Description

second time we had a powerfail, guess it's time to report this...

everything is on a UPS except an e-sata disk (icy box adapter) on a workstation running OpenIndiana Build oi_151a2 64-bit (illumos fc320b2833d3)
when the powerfail hits, only the e-sata disk loses power then comes back up.

zpool list or zpool status commands hang... cannot seem to be able to put back online before rebooting, perhaps I'm missing the magic incantation. there should be a better way to get the device back online, isn't there?

extrait from /var/adm/messages:

Apr 10 09:25:43 x3200 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci1025,157@9/disk@3,0 (sd6):
Apr 10 09:25:43 x3200     Command failed to complete...Device is gone
Apr 10 09:43:34 x3200 sata: [ID 801593 kern.warning] WARNING: /pci@0,0/pci1025,157@9:
Apr 10 09:43:34 x3200  SATA device detached at port 3
Apr 10 09:43:34 x3200 sata: [ID 801593 kern.warning] WARNING: /pci@0,0/pci1025,157@9:
Apr 10 09:43:34 x3200  SATA device detached at port 3
Apr 10 09:43:44 x3200 sata: [ID 801593 kern.warning] WARNING: /pci@0,0/pci1025,157@9:
Apr 10 09:43:44 x3200  SATA device detected at port 3
Apr 10 09:43:44 x3200 sata: [ID 663010 kern.info] /pci@0,0/pci1025,157@9 :
Apr 10 09:43:44 x3200 sata: [ID 761595 kern.info]     SATA disk device at port 3
Apr 10 09:43:44 x3200 sata: [ID 846691 kern.info]     model WDC WD3200AAJS-22B4A0                   
Apr 10 09:43:44 x3200 sata: [ID 693010 kern.info]     firmware 01.03A01
Apr 10 09:43:44 x3200 sata: [ID 163988 kern.info]     serial number      WD-WCAT13982913
Apr 10 09:43:44 x3200 sata: [ID 594940 kern.info]     supported features:
Apr 10 09:43:44 x3200 sata: [ID 981177 kern.info]      48-bit LBA, DMA, Native Command Queueing, SMART, SMART self-test
Apr 10 09:43:44 x3200 sata: [ID 643337 kern.info]     SATA Gen2 signaling speed (3.0Gbps)
Apr 10 09:43:44 x3200 sata: [ID 349649 kern.info]     Supported queue depth 32
Apr 10 09:43:44 x3200 sata: [ID 349649 kern.info]     capacity = 625142448 sectors
Apr 10 09:43:44 x3200 sata: [ID 801593 kern.warning] WARNING: /pci@0,0/pci1025,157@9:
Apr 10 09:43:44 x3200  Application(s) accessing previously attached SATA device have to release it before newly inserted device can be made accessible.

Actions #1

Updated by Richard PALO about 11 years ago

Now I'm a bit up a stream...
Thought I'd try to reproduce by simply turning off then back on my device.

Now I get

$ sudo zpool status -vx apool
  pool : apool
 état : UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   reportez-vous au site : http://www.sun.com/msg/ZFS-8000-HC
  scan: scrub repaired 0 in 2h26m with 0 errors on Sat Oct  8 12:41:16 2011
configuration :

    NAME        STATE     READ WRITE CKSUM
    apool       UNAVAIL      0     0     0  répliques insuffisantes
      c5t3d0s0  UNAVAIL      0     0     0  ouverture impossible

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x0>
        <metadata>:<0x1>
        <metadata>:<0x4a>
        apool:<0x3>
$ zfs list apool
cannot open 'apool': pool I/O is currently suspended
$ sudo zpool clear apool
cannot clear errors for apool: Erreur E/S
$ sudo zpool clear apool c5t3d0s0
cannot clear errors for c5t3d0s0: Erreur E/S

gulp, help?

Actions #2

Updated by Richard PALO about 11 years ago

Tried simply rebooting, and it was back in order. Same problem then, just different symptoms.
On thing I didn't mention above in the recreation, is that I noticed that the device comes back up unconfigured, after power off and then back on, needing to be manually configured with cfgadm (like any time when I boot without the device powered on : #cfgadm -c configure sata0/3)

scrubbing now to make sure, but I presume there is nothing wrong as was the case with a real powerfail.

Actions

Also available in: Atom PDF