Project

General

Profile

Actions

Bug #5213

closed

panic in metaslab_init due to space_map_open returning ENXIO

Added by Prakash Surya over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
2014-10-06
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

I managed to hit the following assertion:

panic message:          
assertion failed: space_map_open(&msp->ms_sm, mos, object, msp->ms_start, msp->ms_size, vd->vdev_ashift, &msp->ms_lock) 0 (0x6 0x0), file: ../../common/fs/zfs/metaslab.c, line: 1408

With the following stack trace:

vpanic()
strlog()
metaslab_init+0x1bb(ffffff04fbbb3980, 2, 43, 0)
vdev_metaslab_init+0x15d(ffffff0510a280c0, 0)
vdev_load+0xd2(ffffff0510a280c0)
vdev_load+0x3b(ffffff0511971900)
spa_load_impl+0xa2f(ffffff051135d000, 7fad683ba845ba11, ffffff04f319f018, 3, 0, 1)
spa_load+0x14e(ffffff051135d000, 3, 0, 1)
spa_tryimport+0xaa(ffffff04fef6d648)
zfs_ioc_pool_tryimport+0x51(ffffff05f44bb000)
zfsdev_ioctl+0x4a7(1900000000, 5a06, 80423ec, 100003, ffffff04f04b4188, ffffff001e91fe68)
cdev_ioctl+0x39(1900000000, 5a06, 80423ec, 100003, ffffff04f04b4188, ffffff001e91fe68)
spec_ioctl+0x60(ffffff04ecbdd140, 5a06, 80423ec, 100003, ffffff04f04b4188, ffffff001e91fe68)
fop_ioctl+0x55(ffffff04ecbdd140, 5a06, 80423ec, 100003, ffffff04f04b4188, ffffff001e91fe68)
ioctl+0x9b(3, 5a06, 80423ec)
_sys_sysenter_post_swapgs+0x149()

I will post a fix to better handle this error case soon.

Actions #1

Updated by Prakash Surya over 8 years ago

I've posted a patch for review here: https://reviews.csiden.org/r/110/

Actions #2

Updated by Bayard Bell over 8 years ago

  • Tags deleted (needs-triage)
Actions #3

Updated by George Wilson over 8 years ago

This problem can occur if an invalid/stale pool configuration is passed to the kernel as part of an import attempt. The code today assumes (incorrectly) that by the time vdev_load() is invoked that the configuration has been deemed valid but that's not true. It's possible that a configuration could be passed to the kernel that has extra vdevs in the configuration. Those vdevs may have come from an old pool that existed but has since been recreated. The example from the crash dump:

::spa -v

ADDR STATE NAME
ffffff051135d000 ACTIVE $import

ADDR             STATE     AUX          DESCRIPTION                        
ffffff0511971900 CANT_OPEN CORRUPT_DATA root
ffffff04ec444380 CANT_OPEN CORRUPT_DATA missing
ffffff04f7fcd680 CANT_OPEN CORRUPT_DATA missing
ffffff04f7fce940 CANT_OPEN CORRUPT_DATA missing
ffffff04f1a34640 CANT_OPEN CORRUPT_DATA missing
ffffff0510a28d40 CANT_OPEN CORRUPT_DATA missing
ffffff0510a280c0 HEALTHY - /dev/dsk/c2t5d0s0
ffffff05ee585940 HEALTHY - missing
ffffff05ee585300 HEALTHY - /dev/dsk/c2t8d0s0

Clearly, this pool configuration is a mess but the import was able to get far enough to call vdev_load() where it starts to load the metaslab associated with c2t5d0s0. This ends up trying to load the space_map for object 0x43 but that object doesn't exist resulting in a panic. A similar panic would occur if the object did exist but had a checksum error (again because it's stale).

In most case we would expect vdev_load() to just succeed as is seen in the example below:

  1. zpool import
    pool: data
    id: 11689234226058225859
    state: ONLINE
    action: The pool can be imported using its name or numeric identifier.
    config:

    data ONLINE
    c1t1d0 ONLINE
    c1t3d0 ONLINE

    pool: data
    id: 7335367931613495462
    state: UNAVAIL
    status: One or more devices are missing from the system.
    action: The pool cannot be imported. Attach the missing
    devices and try again.
    see: http://illumos.org/msg/ZFS-8000-6X
    config:

    data         UNAVAIL  missing device
    c1t2d0 ONLINE
    Additional devices are known to be part of this pool, though their
    exact configuration cannot be determined.

Nonetheless, it's not guaranteed that we will be able to load the space_map or even the metaslab array so we must handle errors and bubble them up to vdev_load() to set the pool state accordingly.

Actions #4

Updated by Electric Monk over 8 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 1e9bd7ec42f2d3bf854c2da35310901194833267

commit  1e9bd7ec42f2d3bf854c2da35310901194833267
Author: Prakash Surya <prakash.surya@delphix.com>
Date:   2014-11-23T03:09:00.000Z

    5213 panic in metaslab_init due to space_map_open returning ENXIO
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: George Wilson <george.wilson@delphix.com>
    Reviewed by: Bayard Bell <Bayard.Bell@nexenta.com>
    Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk>
    Approved by: Dan McDonald <danmcd@omniti.com>

Actions

Also available in: Atom PDF