Project

General

Profile

Actions

Bug #16403

open

zfs perhaps too eager to remove temporarily offline disk

Added by Bill Sommerfeld about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

About twice a year, I see a disk go offline temporarily on one of my systems; zfs has been reacting to this by setting the state to REMOVED. The disk comes back into the pool after a reboot/power cycle (there may be less drastic interventions that bring it back but I've been powercycling after these events), but as vdev_disk.c requests a SPA_ASYNC_REMOVE in response to these events (rather than merely marking it OFFLINE), zfs insists on doing a full resilver to recover rather than a fast resilver driven off the DTL that would have finished in a few seconds.

This most recently happened today, with the first logged event happening at 02:09:37, a "IOC reset" happening at 02:09:42, the drive being marked offline at 02:09:59, and then being marked online again at 2:10:32.

A resilver is currently in progress, happening without errors.

I've attached fmdump -e -V output and the contents of /var/adm/messages around the time of the offline.

Controller is described by prtconf -d as a: pci1000,3020 (pciex1000,72) [Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]], instance #0
Disk is a Seagate IronWolf Pro 6TB (ST6000NE000)

I suspect I'd be better served if there was a way to disable the auto-remove that was added by #4128. Marking it OFFLINE would I think be more easily recoverable.


Files

messages (6.41 KB) messages messages around offline event Bill Sommerfeld, 2024-03-15 05:37 PM
fmdump-e-V (18.4 KB) fmdump-e-V Bill Sommerfeld, 2024-03-15 05:40 PM

No data to display

Actions

Also available in: Atom PDF