Bug #162
closedzfs: scrub fails to complete, no disk activity
0%
Description
SUMMARY:
ZFS array seems to work ok initially, but if I scrub the array, after examining 2% of it there is no further disk activity, and any reads/writes to that array block. The system is still responsive (booted from a different zpool), but the disk array is not available. This condition occurs in b145, as well as illumos-gate (built from code pulled 06-september-2010)
This bug is reproducible always.
STEPS TO REPRODUCE:
1. boot into illumos-gate or os-devel_145
2. import the zpool (named vla for "very large array")
3. pfexec zpool scan vla
The scrub starts up fine, and seems to proceed ok. But after a while all disk activity stops.
SOME SYSTEM DETAILS
AMD system with 790GX chipset, with two additional PCIe Sil3132 SATA controllers.
8GB RAM
6 Seagate Barracuda LP ST32000542AS 2TB 5900 RPM SATA 3.0Gb/s
1 OCZ Technology 60 GB Vertex 2 SSD
# zpool status vla pool: vla state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scan: scrub in progress since Mon Sep 6 13:12:24 2010 40.1G scanned out of 1.87T at 3.92M/s, 136h15m to go 2.30M repaired, 2.09% done config: NAME STATE READ WRITE CKSUM vla ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 (repairing) c6t3d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 (repairing) cache c7t0d0 ONLINE 0 0 0 errors: No known data errors
deduplication is ON for this zpool.
I can re-boot into onnv-142, complete the scrub, everything seems fine.
I would like to dig into this problem, but don't yet know how to proceed.
Updated by Boyd Waters about 13 years ago
(not sure it matters... but the cache SSD is on its own Sil3132 controller. Both of these 3132 SATA controllers show up as "c7....")
Updated by Boyd Waters about 13 years ago
Hmm.. just re-booted into b142, and the array shows up like this before it is imported:
$ pfexec zpool import pool: vla id: 2855413885199491550 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: vla ONLINE mirror-0 ONLINE c7t3d0 ONLINE c6t3d0 ONLINE mirror-1 ONLINE c6t4d0 ONLINE c6t5d0 ONLINE mirror-2 ONLINE c6t2d0 ONLINE c7t2d0 ONLINE cache c8t0d0
Note the "cache" is on a "c8" controller... and once I import it, the cache goes onto the "c7" controller...
$ pfexec zpool import vla $ zpool status pool: rpool state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scan: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c6t0d0s0 ONLINE 0 0 0 errors: No known data errors pool: vla state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scan: scrub in progress since Mon Sep 6 13:12:24 2010 40.3G scanned out of 1.87T at 9.01M/s, 59h17m to go 2.30M repaired, 2.10% done config: NAME STATE READ WRITE CKSUM vla ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 1 c6t3d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 cache c7t0d0 ONLINE 0 0 0
Updated by Boyd Waters about 13 years ago
$ zpool get all vla NAME PROPERTY VALUE SOURCE vla size 5.44T - vla capacity 34% - vla altroot - default vla health ONLINE - vla guid 2855413885199491550 default vla version 26 default vla bootfs - default vla delegation on default vla autoreplace off default vla cachefile - default vla failmode wait default vla listsnapshots off default vla autoexpand off default vla dedupditto 0 default vla dedupratio 2.85x - vla free 3.56T - vla allocated 1.87T -
Updated by Boyd Waters about 13 years ago
I suspect a crappy SATA controller here... my original Sil3132 card. This controller has two hard disks on it. When I first got the SSD, I added it to this controller, and it couldn't handle the load, I had lots of errors on the hard disks. I added a second 3132, and (under b142) my problems seemed to go away. But maybe there are still problems with that first add-on SATA controller.
I re-booted the computer into b142, and imported the pool. It started the scrub, and here's what is happening:
pool: vla state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scan: scrub in progress since Mon Sep 6 13:12:24 2010 185G scanned out of 1.87T at 22.8M/s, 21h37m to go 2.30M repaired, 9.63% done config: NAME STATE READ WRITE CKSUM vla ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 14 c6t3d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 2 cache c7t0d0 ONLINE 0 0 0 errors: No known data errors
Note the disks showing problems here: they are on that first SATA controller.
Maybe illumos-gate runs into some unrecoverable data corruption with on that controller, and doesn't time out a disk transaction?
Updated by Boyd Waters about 13 years ago
Rebooted back into illumos-nightly, and started a scrub. This time it ran for 12 hours, before hitting the stuck behavior.
$ iostat -En c6t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST9100824AS Revision: 7.01 Serial No: Size: 100.03GB <100030242816 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 4 Predictive Failure Analysis: 0 c6t2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST32000542AS Revision: CC32 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 10 Predictive Failure Analysis: 0 c6t3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST32000542AS Revision: CC32 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 10 Predictive Failure Analysis: 0 c6t4d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST32000542AS Revision: CC32 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 10 Predictive Failure Analysis: 0 c6t5d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST32000542AS Revision: CC32 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 10 Predictive Failure Analysis: 0 c7t2d0 Soft Errors: 6 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST32000542AS Revision: CC34 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 10 Predictive Failure Analysis: 0 c7t3d0 Soft Errors: 24 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST320005XXXX Revision: CC34 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 10 Predictive Failure Analysis: 0 c8t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: OCZ-VERTEX2 Revision: 1.11 Serial No: Size: 60.02GB <60022480896 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 2 Predictive Failure Analysis: 0
Those "illegal request" errors aren't very relevant...
$ zpool iostat -v vla capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- vla 1.88T 3.56T 0 0 0 0 mirror 640G 1.19T 0 0 0 0 c7t3d0 - - 0 0 0 0 c6t3d0 - - 0 0 0 0 mirror 640G 1.19T 0 0 0 0 c6t4d0 - - 0 0 0 0 c6t5d0 - - 0 0 0 0 mirror 640G 1.19T 0 0 0 0 c6t2d0 - - 0 0 0 0 c7t2d0 - - 0 0 0 0 cache - - - - - - c7t0d0 1.40G 54.5G 0 0 0 0 ---------- ----- ----- ----- ----- ----- -----
No disk I/O. But we knew that already...
$ fmdump -e TIME CLASS Sep 06 13:12:34.9595 ereport.io.scsi.cmd.disk.tran Sep 06 13:12:34.9595 ereport.io.scsi.cmd.disk.tran . . . [this error repeats for about 300 times] $ fmdump -eV -c ereport.io.scsi.cmd.disk.tran Sep 08 2010 03:35:25.539717971 ereport.io.scsi.cmd.disk.tran nvlist version: 0 class = ereport.io.scsi.cmd.disk.tran ena = 0x303476d2ebb00c01 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /pci@0,0/pci1002,4384@14,4/pci1095,7124@7/disk@3,0 (end detector) devid = id1,sd@SATA_____ST320005XXXX________________5XW04QQC driver-assessment = retry op-code = 0x28 cdb = 0x28 0x0 0x40 0x8c 0x13 0x62 0x0 0x0 0xc 0x0 pkt-reason = 0x1 pkt-state = 0x37 pkt-stats = 0x0 __ttl = 0x1 __tod = 0x4c873cbd 0x202b7153 . . . Sep 08 2010 03:35:25.539692043 ereport.io.scsi.cmd.disk.tran nvlist version: 0 class = ereport.io.scsi.cmd.disk.tran ena = 0x303476d2ebb00c01 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /pci@0,0/pci1002,4384@14,4/pci1095,7124@7/disk@3,0 (end detector) devid = id1,sd@SATA_____ST320005XXXX________________5XW04QQC driver-assessment = fail op-code = 0x28 cdb = 0x28 0x0 0x40 0x8c 0x13 0x62 0x0 0x0 0xc 0x0 pkt-reason = 0x1 pkt-state = 0x37 pkt-stats = 0x0 __ttl = 0x1 __tod = 0x4c873cbd 0x202b0c0b
$ zpool status -v vla pool: vla state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scan: scrub in progress since Tue Sep 7 18:07:22 2010 823G scanned out of 1.88T at 24.1M/s, 12h55m to go 13.5K repaired, 42.86% done config: NAME STATE READ WRITE CKSUM vla ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 (repairing) c6t3d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 (repairing) cache c7t0d0 ONLINE 0 0 0
Lots of retries on those two disks attached to the controller that I'm beginning to suspect. But why does the newer version of onnv/illumos (post-b142) take so long to give up? Why does it block for so long, or try so hard? What's different?
Updated by Boyd Waters about 13 years ago
Oh. The machine stopped responding to SSH and local X logins shortly after I started poking at this bug. The computer locked up.
I've shut it down for now. The replacement SATA controller arrives tomorrow. I'll install it and see if my problem goes away.
Updated by Boyd Waters about 13 years ago
The replacement SATA controller arrived today. I pulled the suspect controller, a Silicon Image 3124, and hooked up the two drives to the new Sil3132. Then rebooted into b142, and performed a scrub. There were more than 40 checksum errors on the drives that had been attached to the 3124.
pool: vla state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scan: scrub repaired 210K in 55h19m with 0 errors on Fri Sep 10 01:27:18 2010 config: NAME STATE READ WRITE CKSUM vla DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c7t3d0 DEGRADED 0 0 0 too many errors c6t3d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 mirror-2 DEGRADED 0 0 0 c6t2d0 ONLINE 0 0 0 c7t2d0 DEGRADED 0 0 0 too many errors cache c7t0d0 ONLINE 0 0 0 errors: No known data errors
I have now rebooted into nightly, and when I did a zpool clear
it started a scrub. I will find out if this replacement hardware has cleared up my crasher.
Updated by Boyd Waters about 13 years ago
on nightly
-- and scrub completed with NO errors.
so the problem is in handling crappy SATA controller. The problem controller was a Silicon Image 3124 4-port PCI-X SATA controller, plugged into one of the PCI slots of my "GIGABYTE GA-MA790GPT-UD3H AM3 AMD 790GX" motherboard. It couldn't keep up with the AMD controller on the southbridge and the additional Silicon Image 3132 PCIe controller that were driving the array.
But under b142, the transactions would timeout and retried. On nightly
, the controller would feed the drives garbled data - and then block. I don't know why yet.
So: I don't know why this error occurred, it did not happen in earlier versions of onnv, but the problem was resolved by identifying problem hardware and replacing it.
Updated by Garrett D'Amore almost 13 years ago
- Status changed from New to Rejected
- Priority changed from High to Normal
So, this was crappy, failing hardware, and the driver doesn't support FMA.
I don't see a specific bug here, really. I'm going to close it out. There possibly should be an RFE for better FMA support by the controller's driver.