Bug #2778
closedzpool replace fails - 2TB advanced format drives
0%
Description
I have an OI 151a system with 30 2TB Seagate ST2000DL003-9VT1 drives. The pool is configure as a stripe of 14 mirrors with 2 hot spares. Initially everything worked fine. Then one of the drives failed and was automatically replaced by a hot spare. I then removed the bad drive and replaced it with a new one. When trying to replace the failed drive I get :
"cannot replace c6t5000C5004526DADEd0 with c6t5000C50044DC2060d0: devices have different sector alignment"
I can add the drive as a hot spare but I'm not sure I wont have the same issue if it were ever be needed.
Please see below for some of the more pertinent information. This is a pre-production system and I can perform tests on this system. I noticed some minor differences in the revision of the new drive and in the prtconf output below
@
root@ist01:/etc# zpool status
pool: iscsipool
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: resilvered 2.19G in 0h12m with 0 errors on Mon May 14 17:37:58 2012
config:
NAME STATE READ WRITE CKSUM
iscsipool DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
c6t5000C50045229535d0 ONLINE 0 0 0
c6t5000C5004521605Dd0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
...
...
SNIP
...
...
mirror-13 DEGRADED 0 0 0
c6t5000C5004526BFADd0 ONLINE 0 0 0
spare-1 DEGRADED 0 0 13
c6t5000C5004526DADEd0 FAULTED 0 26 26 too many errors
c6t5000C5004526E338d0 ONLINE 0 0 35
spares
c6t5000C5004526E338d0 INUSE currently in use
c6t5000C50044CCD8F0d0 AVAIL
errors: No known data errors
---
At this point I replaced the physically replaced drive with an identical, new drive. Obviously I could not do a prtvtoc on the failed drive but I did one on its mirror and the drive I intend to replace the failed drive with. As you see they are identical.
---
- /dev/rdsk/c6t5000C5004526BFADd0 partition map *
- Dimensions:
- 512 bytes/sector
- 3907029168 sectors
- 3907029101 accessible sectors *
- Flags:
- 1: unmountable
- 10: read-only *
- Unallocated space:
- First Sector Last
- Sector Count Sector
- 34 222 255 *
- First Sector Last
- Partition Tag Flags Sector Count Sector Mount Directory
0 4 00 256 3907012495 3907012750
8 11 00 3907012751 16384 3907029134
- /dev/rdsk/c6t5000C50044DC2060d0 partition map *
- Dimensions:
- 512 bytes/sector
- 3907029168 sectors
- 3907029101 accessible sectors *
- Flags:
- 1: unmountable
- 10: read-only *
- Unallocated space:
- First Sector Last
- Sector Count Sector
- 34 222 255 *
- First Sector Last
- Partition Tag Flags Sector Count Sector Mount Directory
0 4 00 256 3907012495 3907012750
8 11 00 3907012751 16384 3907029134
---
prtconf v gives me :
---
root@ist01:/etc# prtconf -v
...
...
...
Device Minor Nodes:
dev=(87,704)
dev_path=/scsi_vhci/disk@g5000c5004526bfad:a
spectype=blk type=minor
dev_link=/dev/dsk/c6t5000C5004526BFADd0s0
dev_path=/scsi_vhci/disk@g5000c5004526bfad:a,raw
spectype=chr type=minor
dev_link=/dev/rdsk/c6t5000C5004526BFADd0s0
Device Minor Layered Under:
mod=zfs accesstype=blk
dev_path=/pseudo/zfs@0
Minor properties:
name='Nblocks' type=int64 items=1 dev=(87,704)
value=00000000e8e0478f
name='Size' type=int64 items=1 dev=(87,704)
value=000001d1c08f1e00
...
...
...
Device Minor Nodes:
dev=(87,2432)
dev_path=/scsi_vhci/disk@g5000c50044dc2060:a
spectype=blk type=minor
dev_link=/dev/dsk/c6t5000C50044DC2060d0s0
dev_path=/scsi_vhci/disk@g5000c50044dc2060:a,raw
spectype=chr type=minor
dev_link=/dev/rdsk/c6t5000C50044DC2060d0s0
Minor properties:
name='lba-access-ok' type=boolean dev=(87,2432)
name='Nblocks' type=int64 items=1 dev=(87,2432)
value=00000000e8e0478f
name='Size' type=int64 items=1 dev=(87,2432)
value=000001d1c08f1e00
---
the replacement drive shows the 'lba-access-ok' minor property but I don't know what that means.
Now trying iostat En
--
root@ist01:/etc# iostat -En
...
...
... <THE DRIVE THAT FAILED>
c6t5000C5004526DADEd0 Soft Errors: 0 Hard Errors: 687 Transport Errors: 1259
Vendor: ATA Product: ST2000DL003-9VT1 Revision: CC32 Serial No: 5YD77RDB
Size: 2000.40GB <2000398934016 bytes>
Media Error: 364 Device Not Ready: 0 No Device: 323 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
...
...
... <THE MIRROR OR THE FAILED DRIVE THAT'S STILL GOOD>
c6t5000C5004526BFADd0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: ST2000DL003-9VT1 Revision: CC32 Serial No: 5YD77PYT
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
...
...
... <THE NEW DRIBE I WANT TO REPALCE THE FAIELD DRIVE WITH>
c6t5000C50044DC2060d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: ST2000DL003-9VT1 Revision: CC98 Serial No: 5YD73P0R
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
---
The new drive has a different revision number be beyond that everything seems ok
Now if I try to replace the failed drive with the new drive this happens
---
root@ist01:/etc# zpool replace iscsipool c6t5000C5004526DADEd0 c6t5000C50044DC2060d0
cannot replace c6t5000C5004526DADEd0 with c6t5000C50044DC2060d0: devices have different sector alignment
@
Updated by Rich Ercolani over 11 years ago
Unfortunately, the system is correct - your drives do differ in alignment.
the ST2000DL003 has several different part numbers that differ slightly - the 9VT166-301 has 512b sectors physical/logical reported to the OS, and all newer versions have 512b logical/4k physical reported to the OS.
Seagate swears you can't firmware flash between the revisions.
A relatively recent smartctl will be able to tell you this is the case for yourself - e.g. (not the relevant drive model, just an example)
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-220.7.1.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: SAMSUNG SpinPoint F3
Device Model: SAMSUNG HD103SJ
Serial Number: S246J90B512297
LU WWN Device Id: 5 0024e9 205301cb7
Firmware Version: 1AJ10001
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 6
Local Time is: Fri May 25 03:26:27 2012 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Updated by Willem van den Berge over 11 years ago
Just a final follow up on this issue. Apparently there are at least 3 firmware revisions out in the world. CC3C and CC98 which report 512 logical/4096 physical and CC32 which reports 512 logical /512 physical. the firmware between these drives is not field upgradable/down gradable. After replacing four of these drives that failed I ended up with 26 of the 512/512 variety and 4 of the 512/4096 variety.
After spending probably 8 hours on the phone with various Seagate departments I finally arrived at their advanced replacement department. The gentleman there took my problem to heart and promptly processed a free advanced replacement of the 26 512/512 drives for 26 512/4096 drives. This meant recreating the pool but the system is happy again.
He also mentioned that the 512/512 drives were originally meant to be put inside Seagate USB external enclosures and were never supposed to be sold as internal drives. Based on the number of issues with the CC32 software documented on the net I find that hard to believe but that's what I was told.