Project

General

Profile

Bug #3603

panic from bpobj_enqueue_subobj()

Added by Christopher Siden over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Category:
zfs - Zettabyte File System
Start date:
2013-03-01
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

From Matt Ahrens's original bug report and analysis:

George, Adam, and I took a look at the dump file.  We found the problem in this
code in bpobj_enqueue_subobj():

    /*
     * If subobj has only one block of subobjs, then move subobj's
     * subobjs to bpo's subobj list directly.  This reduces
     * recursion in bpobj_iterate due to nested subobjs.
     */
    subsubobjs = subbpo.bpo_phys->bpo_subobjs;
    if (subsubobjs != 0) {
        dmu_object_info_t doi;

        VERIFY3U(0, ==, dmu_object_info(bpo->bpo_os, subsubobjs, &doi));
        if (doi.doi_max_offset == doi.doi_data_block_size) {
            dmu_buf_t *subdb;
            uint64_t numsubsub = subbpo.bpo_phys->bpo_num_subobjs;

            VERIFY3U(0, ==, dmu_buf_hold(bpo->bpo_os, subsubobjs,
                0, FTAG, &subdb, 0));
            dmu_write(bpo->bpo_os, bpo->bpo_phys->bpo_subobjs,
                bpo->bpo_phys->bpo_num_subobjs * sizeof (subobj),
                numsubsub * sizeof (subobj), subdb->db_data, tx);

We are writing numsubsub * 8 bytes from the subsubobj's dbuf into the target
bpobj's subobj.  The problem is that dmu_object_info() computes doi_max_offset
from the dnode_phys_t, so if we wrote to the subsubobj in this txg, it may
actually be more than one block long. I.e. numsubsub * 8 > subdb->db_size. 
Thus the dmu_write() will attempt to read past the end of subdb->db_data.

Luckily, the address after subdb->db_data is not mapped, so we panicked rather
than simply corrupting the subobj on disk.

The fix should be as simple as having dmu_object_info_from_dnode() compute
doi_max_offset from dn->dn_maxblkid rather than dn->dn_phys->dn_max_blkid. 
(Plus some assertions in bpobj_enqueue_subobj() to ensure that we never try to
write more data that there is in the dbuf.)

The following two conditions must be met to hit the bug:

There must be a couple hundred subobjs.  This can only happen if we have
created at least that many snapshots of this filesystem/zvol, and then
destroyed them.

We must be deleting several snapshots of the same filesystem/zvol at once. 
This will commonly happen after an upgrade, because many snapshots will have
fallen out of the retention policy while the stack was down during the upgrade.
 In this case we are deleting 1192 snapshots from at least 11 zvols.  If it was
only those 11 zvols, and 1 snapshot falls out of retention every 15 minutes,
then that implies that the stack was down for at least 24 hours.  But it may be
that there were more zvols involved, as we have already deleted 95% of the 1192
snaps to delete, and we don't know what zvols the already-deleted snapshots are
from.
the bug can be reproduced with the following script:

#!/bin/bash  

zpool create test c1t1d0
zfs destroy -r test/fs
zfs create -o recordsize=1k test/fs

n=300

for (( i=0; i<n; i=i+1 )); do
        echo creating $i
        for (( j=0; j<i; j=j+1 )); do
                dd if=/dev/urandom of=/test/fs/file conv=notrunc bs=1k
oseek=$((j * n + i)) count=1 >& /dev/null
        done
        dd if=/dev/urandom of=/test/fs/file conv=notrunc bs=1k oseek=$((i * n))
count=$n >& /dev/null
        zfs snapshot test/fs@$i
done

zfs destroy test/fs@$((n/2))%
zfs destroy test/fs@1%$((n/2-2)),0
The incorrect code has existed in ZFS since the dawn of time (2005).  However
it was made much easier to hit by the fix for 12024, which is the performance
enhancement I made to destroy multiple snapshots of a single filesystem in one
txg.

History

#1

Updated by Christopher Siden over 6 years ago

  • Status changed from In Progress to Closed
commit d047563
Author: Matthew Ahrens <mahrens@delphix.com>
Date:   Mon Mar 4 12:27:52 2013

    3603 panic from bpobj_enqueue_subobj()
    3604 zdb should print bpobjs more verbosely
    Reviewed by: Henrik Mattson <henrik.mattson@delphix.com>
    Reviewed by: Adam Leventhal <ahl@delphix.com>
    Reviewed by: Christopher Siden <christopher.siden@delphix.com>
    Reviewed by: George Wilson <george.wilson@delphix.com>
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Approved by: Dan McDonald <danmcd@nexenta.com>

Also available in: Atom PDF