Project

General

Profile

Bug #9439

ZFS double-free due to failure to dirty indirect block

Added by Brad Lewis over 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2018-04-05
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

History

#1

Updated by Brad Lewis over 1 year ago

We crashed while in the middle of updating the block associated with this dbuf:

ffffff0b3e1d6888::dbuf

addr object lvl blkid holds os
ffffff0b3e1d6888 19 0 0 1
domain0/group-35/ase_db_container-103/ase_timeflow-104/datafile
Here's the block pointer we're trying to free:

ffffff077ba0f960::blkptr

DVA0=<0:ec4d1000:800>
[L0 PLAIN_FILE_CONTENTS] EDONR LZ4 LE contiguous unique single
size=1800L/800P birth=2408L/2408P fill=1
cksum=81d5d35c700ea714:1fd2f35685a1a24a:b02d7eedc66d0092:3550feaaf998ddbd
This range has already been added to the deferred tree list, which means it was freed already (2 txgs ago):

0xffffff0579bb5a80::print range_seg_t

{
rs_node = {
avl_child = [ 0, 0 ]
avl_pcb = 0xffffff091bde39b1
}
rs_pp_node = {
avl_child = [ 0xbaddcafebaddcafe, 0xbaddcafebaddcafe ]
avl_pcb = 0xbaddcafebaddcafe
}
rs_start = 0xec4d1000
rs_end = 0xec4d1800
}

0xffffff0579bb5a80::whatis

ffffff0579bb5a80 is allocated from range_seg_cache:
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
ffffff056aa537c0 ffffff0579bb5a80 604f93161fe ffffff00141aec40
ffffff04b3f7bb08 ffffff049c5685c0 ffffff04a73bbba0
kmem_cache_alloc_debug+0x2e0
kmem_cache_alloc+0xdd
range_tree_add+0x286
metaslab_free_concrete+0x160
metaslab_free_impl+0x102
metaslab_free_dva+0x91
metaslab_free+0xe9
zio_dva_free+0x1f
zio_execute+0xfa
zio_wait+0x95
zio_free+0xe8
dsl_free+0x1c
dsl_dataset_block_kill+0x15e
free_blocks+0x1a5
free_children+0x270

The odd (non-power-of-2) LSIZE of blkid=1 could be caused by free_blocks() setting the LSIZE of the hole to dn_datablksz. If the object was truncated (from 2 blocks to 1), its dn_datablksz could be changed to an odd size.

This situation is not a problem per se, but it happens to set the necessary conditions to hit this bug. Once we have an indirect block and the block size is not a power of 2, if we truncate the file to zero length, dnode_free_range() fails to dirty the L1 indirect in this case:

if (ISP2(blksz)) {

} else {
ASSERT(dn->dn_maxblkid 0);
if (off 0 && len >= blksz) {
/* * Freeing the whole block; fast-track this request. * Note that we won't dirty any indirect blocks, * which is fine because we will be freeing the entire * file and thus all indirect blocks will be freed * by free_children().
*/
blkid = 0;
nblks = 1;
goto done;

This is caused by the fix for 9438.This commit didn’t change the above code, but made its assumption untrue: we no longer free all the indirects in free_children when doing a free_range() of the entire file, only when we are actually freeing the dnode.

#2

Updated by Electric Monk about 1 year ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 99a19144e82244f3426f055cc73af8a937c0135c

commit  99a19144e82244f3426f055cc73af8a937c0135c
Author: Matthew Ahrens <mahrens@delphix.com>
Date:   2018-06-19T19:28:29.000Z

    9439 ZFS double-free due to failure to dirty indirect block
    Reviewed by: George Wilson <george.wilson@delphix.com>
    Reviewed by: Paul Dagnelie <pcd@delphix.com>
    Approved by: Robert Mustacchi <rm@joyent.com>

Also available in: Atom PDF