Project

General

Profile

Bug #162

zfs: scrub fails to complete, no disk activity

Added by Boyd Waters over 9 years ago. Updated over 9 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2010-09-06
Due date:
% Done:

0%

Estimated time:
Difficulty:
Tags:

Description

SUMMARY:
ZFS array seems to work ok initially, but if I scrub the array, after examining 2% of it there is no further disk activity, and any reads/writes to that array block. The system is still responsive (booted from a different zpool), but the disk array is not available. This condition occurs in b145, as well as illumos-gate (built from code pulled 06-september-2010)

This bug is reproducible always.

STEPS TO REPRODUCE:
1. boot into illumos-gate or os-devel_145
2. import the zpool (named vla for "very large array")
3. pfexec zpool scan vla

The scrub starts up fine, and seems to proceed ok. But after a while all disk activity stops.

SOME SYSTEM DETAILS
AMD system with 790GX chipset, with two additional PCIe Sil3132 SATA controllers.
8GB RAM
6 Seagate Barracuda LP ST32000542AS 2TB 5900 RPM SATA 3.0Gb/s
1 OCZ Technology 60 GB Vertex 2 SSD

# zpool status vla
  pool: vla
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scan: scrub in progress since Mon Sep  6 13:12:24 2010
    40.1G scanned out of 1.87T at 3.92M/s, 136h15m to go
    2.30M repaired, 2.09% done
config:

        NAME        STATE     READ WRITE CKSUM
        vla         ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c7t3d0  ONLINE       0     0     0  (repairing)
            c6t3d0  ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            c6t4d0  ONLINE       0     0     0
            c6t5d0  ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            c6t2d0  ONLINE       0     0     0
            c7t2d0  ONLINE       0     0     0  (repairing)
        cache
          c7t0d0    ONLINE       0     0     0

errors: No known data errors

deduplication is ON for this zpool.

I can re-boot into onnv-142, complete the scrub, everything seems fine.

I would like to dig into this problem, but don't yet know how to proceed.

History

#1

Updated by Boyd Waters over 9 years ago

(not sure it matters... but the cache SSD is on its own Sil3132 controller. Both of these 3132 SATA controllers show up as "c7....")

#2

Updated by Boyd Waters over 9 years ago

Hmm.. just re-booted into b142, and the array shows up like this before it is imported:

$ pfexec zpool import
  pool: vla
    id: 2855413885199491550
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

        vla         ONLINE
          mirror-0  ONLINE
            c7t3d0  ONLINE
            c6t3d0  ONLINE
          mirror-1  ONLINE
            c6t4d0  ONLINE
            c6t5d0  ONLINE
          mirror-2  ONLINE
            c6t2d0  ONLINE
            c7t2d0  ONLINE
        cache
          c8t0d0

Note the "cache" is on a "c8" controller... and once I import it, the cache goes onto the "c7" controller...

$ pfexec zpool import vla
$ zpool status
  pool: rpool
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          c6t0d0s0  ONLINE       0     0     0

errors: No known data errors

  pool: vla
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scan: scrub in progress since Mon Sep  6 13:12:24 2010
    40.3G scanned out of 1.87T at 9.01M/s, 59h17m to go
    2.30M repaired, 2.10% done
config:

        NAME        STATE     READ WRITE CKSUM
        vla         ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c7t3d0  ONLINE       0     0     1
            c6t3d0  ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            c6t4d0  ONLINE       0     0     0
            c6t5d0  ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            c6t2d0  ONLINE       0     0     0
            c7t2d0  ONLINE       0     0     0
        cache
          c7t0d0    ONLINE       0     0     0

#3

Updated by Boyd Waters over 9 years ago

$ zpool get all vla
NAME  PROPERTY       VALUE       SOURCE
vla   size           5.44T       -
vla   capacity       34%         -
vla   altroot        -           default
vla   health         ONLINE      -
vla   guid           2855413885199491550  default
vla   version        26          default
vla   bootfs         -           default
vla   delegation     on          default
vla   autoreplace    off         default
vla   cachefile      -           default
vla   failmode       wait        default
vla   listsnapshots  off         default
vla   autoexpand     off         default
vla   dedupditto     0           default
vla   dedupratio     2.85x       -
vla   free           3.56T       -
vla   allocated      1.87T       -

#4

Updated by Boyd Waters over 9 years ago

I suspect a crappy SATA controller here... my original Sil3132 card. This controller has two hard disks on it. When I first got the SSD, I added it to this controller, and it couldn't handle the load, I had lots of errors on the hard disks. I added a second 3132, and (under b142) my problems seemed to go away. But maybe there are still problems with that first add-on SATA controller.

I re-booted the computer into b142, and imported the pool. It started the scrub, and here's what is happening:

  pool: vla
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scan: scrub in progress since Mon Sep  6 13:12:24 2010
    185G scanned out of 1.87T at 22.8M/s, 21h37m to go
    2.30M repaired, 9.63% done
config:

        NAME        STATE     READ WRITE CKSUM
        vla         ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c7t3d0  ONLINE       0     0    14
            c6t3d0  ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            c6t4d0  ONLINE       0     0     0
            c6t5d0  ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            c6t2d0  ONLINE       0     0     0
            c7t2d0  ONLINE       0     0     2
        cache
          c7t0d0    ONLINE       0     0     0

errors: No known data errors

Note the disks showing problems here: they are on that first SATA controller.

Maybe illumos-gate runs into some unrecoverable data corruption with on that controller, and doesn't time out a disk transaction?

#5

Updated by Boyd Waters over 9 years ago

Rebooted back into illumos-nightly, and started a scrub. This time it ran for 12 hours, before hitting the stuck behavior.

$ iostat -En
c6t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: ATA      Product: ST9100824AS      Revision: 7.01 Serial No:  
Size: 100.03GB <100030242816 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 4 Predictive Failure Analysis: 0 
c6t2d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: ATA      Product: ST32000542AS     Revision: CC32 Serial No:  
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 10 Predictive Failure Analysis: 0 
c6t3d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: ATA      Product: ST32000542AS     Revision: CC32 Serial No:  
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 10 Predictive Failure Analysis: 0 
c6t4d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: ATA      Product: ST32000542AS     Revision: CC32 Serial No:  
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 10 Predictive Failure Analysis: 0 
c6t5d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: ATA      Product: ST32000542AS     Revision: CC32 Serial No:  
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 10 Predictive Failure Analysis: 0 
c7t2d0           Soft Errors: 6 Hard Errors: 0 Transport Errors: 0 
Vendor: ATA      Product: ST32000542AS     Revision: CC34 Serial No:  
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 10 Predictive Failure Analysis: 0 
c7t3d0           Soft Errors: 24 Hard Errors: 0 Transport Errors: 0 
Vendor: ATA      Product: ST320005XXXX     Revision: CC34 Serial No:  
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 10 Predictive Failure Analysis: 0 
c8t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: ATA      Product: OCZ-VERTEX2      Revision: 1.11 Serial No:  
Size: 60.02GB <60022480896 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 2 Predictive Failure Analysis: 0 

Those "illegal request" errors aren't very relevant...

$ zpool iostat -v vla
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
vla         1.88T  3.56T      0      0      0      0
  mirror     640G  1.19T      0      0      0      0
    c7t3d0      -      -      0      0      0      0
    c6t3d0      -      -      0      0      0      0
  mirror     640G  1.19T      0      0      0      0
    c6t4d0      -      -      0      0      0      0
    c6t5d0      -      -      0      0      0      0
  mirror     640G  1.19T      0      0      0      0
    c6t2d0      -      -      0      0      0      0
    c7t2d0      -      -      0      0      0      0
cache           -      -      -      -      -      -
  c7t0d0    1.40G  54.5G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----

No disk I/O. But we knew that already...

$ fmdump -e
TIME                 CLASS
Sep 06 13:12:34.9595 ereport.io.scsi.cmd.disk.tran   
Sep 06 13:12:34.9595 ereport.io.scsi.cmd.disk.tran   
.
.
.
[this error repeats for about 300 times]

$ fmdump -eV -c ereport.io.scsi.cmd.disk.tran
Sep 08 2010 03:35:25.539717971 ereport.io.scsi.cmd.disk.tran
nvlist version: 0
        class = ereport.io.scsi.cmd.disk.tran
        ena = 0x303476d2ebb00c01
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = dev
                device-path = /pci@0,0/pci1002,4384@14,4/pci1095,7124@7/disk@3,0
        (end detector)

        devid = id1,sd@SATA_____ST320005XXXX________________5XW04QQC
        driver-assessment = retry
        op-code = 0x28
        cdb = 0x28 0x0 0x40 0x8c 0x13 0x62 0x0 0x0 0xc 0x0
        pkt-reason = 0x1
        pkt-state = 0x37
        pkt-stats = 0x0
        __ttl = 0x1
        __tod = 0x4c873cbd 0x202b7153
.
.
.
Sep 08 2010 03:35:25.539692043 ereport.io.scsi.cmd.disk.tran
nvlist version: 0
        class = ereport.io.scsi.cmd.disk.tran
        ena = 0x303476d2ebb00c01
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = dev
                device-path = /pci@0,0/pci1002,4384@14,4/pci1095,7124@7/disk@3,0
        (end detector)

        devid = id1,sd@SATA_____ST320005XXXX________________5XW04QQC
        driver-assessment = fail
        op-code = 0x28
        cdb = 0x28 0x0 0x40 0x8c 0x13 0x62 0x0 0x0 0xc 0x0
        pkt-reason = 0x1
        pkt-state = 0x37
        pkt-stats = 0x0
        __ttl = 0x1
        __tod = 0x4c873cbd 0x202b0c0b

$ zpool status -v vla
  pool: vla
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scan: scrub in progress since Tue Sep  7 18:07:22 2010
    823G scanned out of 1.88T at 24.1M/s, 12h55m to go
    13.5K repaired, 42.86% done
config:

        NAME        STATE     READ WRITE CKSUM
        vla         ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c7t3d0  ONLINE       0     0     0  (repairing)
            c6t3d0  ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            c6t4d0  ONLINE       0     0     0
            c6t5d0  ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            c6t2d0  ONLINE       0     0     0
            c7t2d0  ONLINE       0     0     0  (repairing)
        cache
          c7t0d0    ONLINE       0     0     0

Lots of retries on those two disks attached to the controller that I'm beginning to suspect. But why does the newer version of onnv/illumos (post-b142) take so long to give up? Why does it block for so long, or try so hard? What's different?

#6

Updated by Boyd Waters over 9 years ago

Oh. The machine stopped responding to SSH and local X logins shortly after I started poking at this bug. The computer locked up.

I've shut it down for now. The replacement SATA controller arrives tomorrow. I'll install it and see if my problem goes away.

#7

Updated by Boyd Waters over 9 years ago

The replacement SATA controller arrived today. I pulled the suspect controller, a Silicon Image 3124, and hooked up the two drives to the new Sil3132. Then rebooted into b142, and performed a scrub. There were more than 40 checksum errors on the drives that had been attached to the 3124.

  pool: vla
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scan: scrub repaired 210K in 55h19m with 0 errors on Fri Sep 10 01:27:18 2010
config:

        NAME        STATE     READ WRITE CKSUM
        vla         DEGRADED     0     0     0
          mirror-0  DEGRADED     0     0     0
            c7t3d0  DEGRADED     0     0     0  too many errors
            c6t3d0  ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            c6t4d0  ONLINE       0     0     0
            c6t5d0  ONLINE       0     0     0
          mirror-2  DEGRADED     0     0     0
            c6t2d0  ONLINE       0     0     0
            c7t2d0  DEGRADED     0     0     0  too many errors
        cache
          c7t0d0    ONLINE       0     0     0

errors: No known data errors

I have now rebooted into nightly, and when I did a zpool clear it started a scrub. I will find out if this replacement hardware has cleared up my crasher.

#8

Updated by Boyd Waters over 9 years ago

on nightly -- and scrub completed with NO errors.

so the problem is in handling crappy SATA controller. The problem controller was a Silicon Image 3124 4-port PCI-X SATA controller, plugged into one of the PCI slots of my "GIGABYTE GA-MA790GPT-UD3H AM3 AMD 790GX" motherboard. It couldn't keep up with the AMD controller on the southbridge and the additional Silicon Image 3132 PCIe controller that were driving the array.

But under b142, the transactions would timeout and retried. On nightly, the controller would feed the drives garbled data - and then block. I don't know why yet.

So: I don't know why this error occurred, it did not happen in earlier versions of onnv, but the problem was resolved by identifying problem hardware and replacing it.

#9

Updated by Garrett D'Amore over 9 years ago

  • Status changed from New to Rejected
  • Priority changed from High to Normal

So, this was crappy, failing hardware, and the driver doesn't support FMA.

I don't see a specific bug here, really. I'm going to close it out. There possibly should be an RFE for better FMA support by the controller's driver.

Also available in: Atom PDF