Project

General

Profile

Bug #9690

metaslab of vdev with no space maps was flushed during removal

Added by Brad Lewis about 1 year ago. Updated 12 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
2018-07-31
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

While running zloop, this assertion was hit:

assertion failed for thread 0xfffffd7ffe64a240, 
thread-id 334: dn->dn_phys->dn_type || dn->dn_allocated_txg, file ../../../uts/common/fs/zfs/dnode.c, line 1327

libc.so.1`_lwp_kill+0xa()
libc.so.1`_assfail+0x182(fffffd7ff9d4b970, fffffd7ff9d4ad20, 52f)
libc.so.1`assfail+0x19(fffffd7ff9d4b970, fffffd7ff9d4ad20, 52f)
libzpool.so.1`dnode_setdirty+0x1ce(f36a5d0, 157b5d00)
libzpool.so.1`dbuf_dirty+0x63a(1065e200, 157b5d00)
libzpool.so.1`dmu_buf_will_dirty+0xfc(1065e200, 157b5d00)
libzpool.so.1`space_map_write+0x75(a578880, 170eba00, 0, 1000000, 157b5d00)
libzpool.so.1`metaslab_flush+0x173(2215940, 157b5d00)
libzpool.so.1`spa_flush_metaslabs+0x1a5(1d3c000, 157b5d00)
libzpool.so.1`spa_sync_iterate_to_convergence+0x170(1d3c000, 157b5d00)
libzpool.so.1`spa_sync+0x26a(1d3c000, 11a)
libzpool.so.1`txg_sync_thread+0x2ab(2096740)
libc.so.1`_thrp_setup+0x8a(fffffd7ffe64a240)
libc.so.1`_lwp_start()

The immediate problem is that this spacemap's dnode doesn't appear to be allocated.

> ::status
debugging core file of ztest (64-bit) from install-os-2074-bc7947b1-0.dcenter
initial argv: /usr/bin/amd64/ztest
threading model: native threads
status: process terminated by SIGABRT (Abort), pid=106523 uid=1025 code=-1
panic message:
assertion failed for thread 0xfffffd7ffe64a240, thread-id 334: dn->dn_phys->dn_type || dn->dn_allocated_txg, file ../../../uts/common/fs/zfs/dnode.c, line 1
327

> $C
fffffd7fd7e9d730 libc.so.1`_lwp_kill+0xa()
fffffd7fd7e9dac0 libc.so.1`_assfail+0x182(fffffd7ff9d4b970, fffffd7ff9d4ad20, 52f)
fffffd7fd7e9daf0 libc.so.1`assfail+0x19(fffffd7ff9d4b970, fffffd7ff9d4ad20, 52f)
fffffd7fd7e9db60 libzpool.so.1`dnode_setdirty+0x1ce(f36a5d0, 157b5d00)
fffffd7fd7e9dc20 libzpool.so.1`dbuf_dirty+0x63a(1065e200, 157b5d00)
fffffd7fd7e9dc60 libzpool.so.1`dmu_buf_will_dirty+0xfc(1065e200, 157b5d00)
fffffd7fd7e9dce0 libzpool.so.1`space_map_write+0x75(a578880, 170eba00, 0, 1000000, 157b5d00)
fffffd7fd7e9dd70 libzpool.so.1`metaslab_flush+0x173(2215940, 157b5d00)
fffffd7fd7e9ddf0 libzpool.so.1`spa_flush_metaslabs+0x1a5(1d3c000, 157b5d00)
fffffd7fd7e9de70 libzpool.so.1`spa_sync_iterate_to_convergence+0x170(1d3c000, 157b5d00)
fffffd7fd7e9def0 libzpool.so.1`spa_sync+0x26a(1d3c000, 11a)
fffffd7fd7e9dfb0 libzpool.so.1`txg_sync_thread+0x2ab(2096740)
fffffd7fd7e9dfe0 libc.so.1`_thrp_setup+0x8a(fffffd7ffe64a240)
fffffd7fd7e9dff0 libc.so.1`_lwp_start()

> f36a5d0::print dnode_t dn_phys->dn_type dn_allocated_txg dn_free_txg
dn_phys->dn_type = 0
dn_allocated_txg = 0
dn_free_txg = 0

This seems to point that the space map's dnode has been freed probably through dnode_sync_free() as this is the function
that actually frees the dnode and zeroes out all the above fields.

The failure occurs while we were trying to flush a metaslab in spa_sync() and these code paths have assertions everywhere
that we only flush metaslabs that have a space map (e.g. went through metaslab_sync() at least once since the creation of
the pool). There is a contradiction then where the metaslab's spacemap information in memory (ms_sm data) exist but the
actual space map on-disk have been freed.

To fix this, we first want to try calling vdev_remove_empty() only for log devices in vdev_sync().

History

#1

Updated by Electric Monk 12 months ago

  • % Done changed from 0 to 100
  • Status changed from New to Closed

git commit 4e75ba682600b2bf19d158577d528c3db65050e8

commit  4e75ba682600b2bf19d158577d528c3db65050e8
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
Date:   2018-10-04T14:31:20.000Z

    9690 metaslab of vdev with no space maps was flushed during removal
    Reviewed by: Matt Ahrens <matt@delphix.com>
    Reviewed by: Brad Lewis <brad.lewis@delphix.com>
    Approved by: Robert Mustacchi <rm@joyent.com>

Also available in: Atom PDF