Project

General

Profile

Bug #1109

zpool detach on spare fails

Added by Roy Sigurd Karlsbakk over 8 years ago. Updated over 8 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
-
Start date:
2011-06-13
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Hi all

I have a ~100TB pool for Bacula backup storage, and after some disk issues, we had some bad drives plus a bad SAS expander, leading to some data corruption. When this happened, a spare took over for a failed drive in a VDEV. During this, the whole VDEV was marked as failed, because of the SAS expander. A cold reboot reset the SAS expander (to be replaced later), and except for the two files with corruption, things work well.

The zpool status now shows all VDEVs running ok, but one, raidz2-5, has all seven drives plus the spare http://paste.ubuntu.com/625800/. Trying to detach the spare, gives me cannot detach c4t44d0: no valid replicas

According to Mark Musante, this is a known issue:

Yeah, this is a known problem. The DTL on the toplevel shows an outage, and is preventing the removal of the spare even though removing the spare won't make the outage worse.

Unfortunately, for opensolaris anyway, there is no workaround.

You could try doing a full scrub, replacing any disks that show errors, and waiting for the resilver to complete. That may clean up the DTL enough to detach the spare.

I haven't tried all of this, due to availability problems (server is placed off-site), but there is clearly a bug here.

roy

History

#1

Updated by Albert Lee over 8 years ago

This doesn't seem to be the same situation Mark is describing in the quote, since none of the pools are degraded (any more). Looks like spare is marked INUSE when it's really not?

#2

Updated by Roy Sigurd Karlsbakk over 8 years ago

Resilver started and finished to the drive, so yes, I think it's supposed to be in use. Even so, I can't detach neither of the two (c4t5d0 or c4t44d0) - both give me an error message of "no valid replicas".

roy

#3

Updated by Roy Sigurd Karlsbakk over 8 years ago

Roy Sigurd Karlsbakk wrote:

Resilver started and finished to the drive, so yes, I think it's supposed to be in use. Even so, I can't detach neither of the two (c4t5d0 or c4t44d0) - both give me an error message of "no valid replicas".

roy

The problem persists. Has anything been changed in regard to this in 151?

roy

Also available in: Atom PDF