Project

General

Profile

Actions

Bug #3749

closed

zfs event processing should work on R/O root filesystems

Added by Will Andrews over 10 years ago. Updated over 10 years ago.

Status:
Closed
Priority:
Normal
Category:
zfs - Zettabyte File System
Start date:
2013-04-24
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:
External Bug:

Description

From the original change log:

This restriction appears to have been put in place to avoid errors with
updating the configuration cache file. However:
  • The majority of asynchronous event handling does not involve configuration
    cache file updates.
  • The configuration cache file need not be on the root file system, so the
    check was not complete.
  • Other classes of errors (e.g. file system full) can also prevent a
    successful update yet do not prevent asynchronous event processing.
  • Some system configurations (e.g. NanoBSD) are designed to never have a R/W
    root, so ZFS event processing is permanently disabled in these systems.
  • Failure to handle asynchronous events promptly can extend the window of
    time that a pool is in a critical state.
At worst, a missed configuration cache update will force the operator to
perform a manual "zfs import" (note -f is not required) to inform the system
about a newly created pool. To minimize the likelihood of this rare
occurrence, configuration cache write failures now emit FMA events so the
operator can take corrective action, and the write is retried every 5 minutes.
This is implemented as a tunable, spa.c:zfs_ccw_retry_interval.
uts/common/fs/zfs/spa.c:
  • Modify spa_async_dispatch to honor configuration cache write limiting. If
    other events are pending, a configuration cache write will be attempted at
    the same time, so the rate limiting only applies when the asynchronous
    dispatch system is otherwise idle. Async events should be rare (e.g.
    device arrival/departure) and configuration cache rarer, so a more
    complicated system to strictly honor the retry limit seems unwarranted.
  • Remove check in spa_async_dispatch() for the root file system being R/W.
uts/common/fs/zfs/spa_config.c:
  • Instead of silently ignoring configuration cache write failures, report
    them via a new FMA event. The current zfs_ereport_post() doesn't allow
    arbitrary name=value pairs to be appended to the report, so the
    configuration cache file name is not available to it. This limitation
    should be addressed in a future update.
uts/common/fs/zfs/sys/spa_impl.h:
  • Add a field to the spa data structure to track the time of the last
    configuration cache write failure. This is referenced in
    spa_async_dispatch() to effect the rate limiting.
uts/common/sys/fm/fs/zfs.h:
  • Add FM_EREPORT_ZFS_CONFIG_CACHE_WRITE as an ereport class.
Actions #1

Updated by Christopher Siden over 10 years ago

  • Category set to zfs - Zettabyte File System
  • Status changed from New to Closed
  • Assignee set to Christopher Siden
  • % Done changed from 0 to 100
commit 3cb69f7
Author: Will Andrews <will@firepipe.net>
Date:   Tue Jun 11 10:13:55 2013

    3749 zfs event processing should work on R/O root filesystems
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Eric Schrock <eric.schrock@delphix.com>
    Approved by: Christopher Siden <christopher.siden@delphix.com>
Actions

Also available in: Atom PDF