Project

General

Profile

Bug #5531

Bug #5056: ZFS deadlock on db_mtx and dn_holds

NULL pointer dereference in dsl_prop_get_ds()

Added by Matthew Ahrens almost 5 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
High
Category:
zfs - Zettabyte File System
Start date:
2015-01-13
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

As reported in: https://gist.github.com/pfmooney/8ca6e4b05c078de42524

I obtained the crash dump from pmooney on IRC; contact him for a copy.

The problem is that the dataset has a NULL ds_dbuf. We are in the middle of executing this dataset's dbu_evict_func. It seems like the evict func should not attempt to reference the dbuf, which has already been evicted.

I believe this bug was introduced by the fix for: 5056 ZFS deadlock on db_mtx and dn_holds

::stack

dsl_prop_get_ds+0x71(ffffff00cf8f09c0, ffffff00c84cab10, 8, 1, ffffff00023b1808, 0)
dsl_prop_notify_all_cb+0x88(ffffff00c7cb1940, ffffff00d0a40cc0, 0)
dmu_objset_find_dp+0x108(ffffff00c7cb1940, 168, fffffffff7dacd20, 0, 2)
dsl_prop_notify_all+0x28(ffffff00d5882a40)
dsl_dir_rename_sync+0x2d1(ffffff00033edb40, ffffff00d4162c40)
dsl_sync_task_sync+0x10a(ffffff00033eda60, ffffff00d4162c40)
dsl_pool_sync+0x28b(ffffff00c7cb1940, 6b07b)
spa_sync+0x27e(ffffff00c8d29000, 6b07b)
txg_sync_thread+0x227(ffffff00c7cb1940)
thread_start+8()

ffffff00cf8f09c0::print dsl_dataset_t ds_dbuf

ds_dbuf = 0

dbu_evict_taskq::print | ::walk taskq_thread | ::print kthread_t t_pcb.val2|$C

ffffff0002254f80 xc_serv+0x109(0, 0)
ffffff0002254fd0 apix_dispatch_by_vector+0x8c(f0)
ffffff0002254ff0 apix_dispatch_hilevel+0x15(f0, 0)
ffffff0003643890 switch_sp_and_call+0x13()
ffffff00036438f0 apix_do_interrupt+0xfe(ffffff0003643900, 0)
ffffff0003643900 _interrupt+0xba()
ffffff0003643a20 mutex_delay_default+7()
ffffff0003643a90 mutex_vector_enter+0xc5(ffffff00d5882ac8)
ffffff0003643b00 dsl_prop_unregister+0x57(ffffff00cf8f09c0, fffffffff7e28c93, fffffffff7d8cfb0, ffffff00d5135000)
ffffff0003643b30 dmu_objset_evict+0x177(ffffff00d5135000)
ffffff0003643b60 dsl_dataset_evict+0x35(ffffff00cf8f09c0)
ffffff0003643c20 taskq_thread+0x2d0(ffffff00c72e3b18)
ffffff0003643c30 thread_start+8()

History

#1

Updated by Justin Gibbs almost 5 years ago

#2

Updated by Dan McDonald over 4 years ago

  • Parent task set to #5056
#3

Updated by Electric Monk over 4 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit e57a022b8f718889ffa92adbde47a8f08abcdb25

commit  e57a022b8f718889ffa92adbde47a8f08abcdb25
Author: Justin T. Gibbs <justing@spectralogic.com>
Date:   2015-01-29T08:08:32.000Z

    5531 NULL pointer dereference in dsl_prop_get_ds()
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Dan McDonald <danmcd@omniti.com>
    Reviewed by: George Wilson <george@delphix.com>
    Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
    Approved by: Robert Mustacchi <rm@joyent.com>

#4

Updated by Justin Gibbs over 4 years ago

  • Status changed from Closed to Feedback
  • % Done changed from 100 to 90

commit e57a022b8f718889ffa92adbde47a8f08abcdb25 didn't fully close the race. It is still possible for the dsl_prop callback entry to be referencing a dsl_dataset_t object which is in the process of being evicted. In the core analyzed below, the dsl_dataset_t in the callback entry ("stale entry") was able to get a hold on the bonus buffer for that dataset object, but this is because a new dsl_dataset_t had been instantiated, referencing the same bonus buffer.

$ mdb vmcore.0
mdb: warning: dump is from SunOS 5.11 joyent_20150202T140745Z; dcmds and macros may not match kernel implementation
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs lofs mpt idm sd crypto random cpc logindmux ptm sppp nsmb smbsrv nfs ipc ]

The stack trace from the crash:

> $c
mutex_enter+0xb()
dmu_buf_rele+0x15(0, fffffffff7e00720)
dsl_dataset_rele+0x19(ffffff0340e250c0, fffffffff7e00720)
dsl_prop_notify_all_cb+0xbb(ffffff0254120900, ffffff02d13aa9c0, 0)
dmu_objset_find_dp+0x108(ffffff0254120900, 366, fffffffff7d742f0, 0, 2)
dsl_prop_notify_all+0x28(ffffff02654dbac0)
dsl_dir_rename_sync+0x2d1(ffffff00111a9b40, ffffff0260db3680)
dsl_sync_task_sync+0x10a(ffffff00111a9a60, ffffff0260db3680)
dsl_pool_sync+0x28b(ffffff0254120900, 1558)
spa_sync+0x27e(ffffff0254d3b000, 1558)
txg_sync_thread+0x227(ffffff0254120900)
thread_start+8()

dsl_dataset_rele() tried to release a NULL ds_dbuf. Verify:

> ffffff0340e250c0 ::print dsl_dataset_t dsl_dbuf
ds_dbuf = 0

Yup. Any chance the callback list is corrupted?

> ffffff02d13aa9c0 ::print dsl_dataset_t ds_dir | ::print dsl_dir_t dd_prop_cbs | ::walk list | ::print dsl_prop_cb_record_t cbr_ds
cbr_ds = 0xffffff0353ff4380
cbr_ds = 0xffffff0353ff4380
cbr_ds = 0xffffff0340e250c0
cbr_ds = 0xffffff0340e250c0
cbr_ds = 0xffffff0340e250c0
cbr_ds = 0xffffff0340e250c0
cbr_ds = 0xffffff0340e250c0
cbr_ds = 0xffffff0340e250c0
cbr_ds = 0xffffff0340e250c0
cbr_ds = 0xffffff0340e250c0
cbr_ds = 0xffffff0340e250c0
cbr_ds = 0xffffff0340e250c0

This dsl_datset_t is still registered multiple times on this dsl_dir. Looks valid.

Was this datset evicted?

> ffffff0340e250c0 ::print dsl_dataset_t ds_dbu
ds_dbu = {
    dbu_tqent = {
        tqent_next = 0
        tqent_prev = 0
        tqent_func = dsl_dataset_evict
        tqent_arg = 0xffffff0340e250c0
        tqent_un = {
            tqent_bucket = 1
            tqent_flags = 0x1
        }
        tqent_thread = 0
        tqent_cv = {
            _opaque = 0
        }
    }
    dbu_evict_func = dsl_dataset_evict
}

The dsl_dataset_t's dbuf user was processed by the eviction taskq, so yes.

We had a hold on the bonus buffer, so the bonus buffer must still be around. Let's find it by object number.

> ffffff0340e250c0 ::print dsl_dataset_t ds_object
ds_object = 0x369

dsl_prop_notify_all_cb() has the dsl pool as its first argument. Find the dnode for this dataset via the meta objset for this pool.

> $c
mutex_enter+0xb()
dmu_buf_rele+0x15(0, fffffffff7e00720)
dsl_dataset_rele+0x19(ffffff0340e250c0, fffffffff7e00720)
dsl_prop_notify_all_cb+0xbb(ffffff0254120900, ffffff02d13aa9c0, 0)
dmu_objset_find_dp+0x108(ffffff0254120900, 366, fffffffff7d742f0, 0, 2)
dsl_prop_notify_all+0x28(ffffff02654dbac0)
dsl_dir_rename_sync+0x2d1(ffffff00111a9b40, ffffff0260db3680)
dsl_sync_task_sync+0x10a(ffffff00111a9a60, ffffff0260db3680)
dsl_pool_sync+0x28b(ffffff0254120900, 1558)
spa_sync+0x27e(ffffff0254d3b000, 1558)
txg_sync_thread+0x227(ffffff0254120900)
thread_start+8()
> ffffff0254120900 ::print dsl_pool_t dp_meta_objset | ::print objset_t os_dnodes | ::walk list | ::print -a dnode_t dn_object
ffffff0382298350 dn_object = 0x370
ffffff03822980d0 dn_object = 0x36f
ffffff0291a3cab0 dn_object = 0x36e
ffffff0354b68cd0 dn_object = 0x36d
ffffff0354b68a50 dn_object = 0x36c
ffffff0289735058 dn_object = 0x36b
ffffff02774e7b20 dn_object = 0x36a
ffffff02774450b0 dn_object = 0x369
...

dn_object is 32 bytes into the dnode. Adjust the address for object 0x369 and verify we did our math correctly:

> ffffff0277445090 ::print dnode_t dn_object
dn_object = 0x369
> ffffff0277445090 ::print dnode_t dn_type
dn_type = 0t16 (DMU_OT_DSL_DATASET)

Yup.

Is the bonus buffer properly held and ineligible for eviction?

> ffffff0277445090 ::print dnode_t dn_bonus | ::print dmu_buf_impl_t db_holds
db_holds = {
    db_holds.rc_count = 0x2
}
> ffffff0277445090 ::print dnode_t dn_bonus | ::print dmu_buf_impl_t db_dirtycnt
db_dirtycnt = 0

Holds > dirties. Yes.

This is why dsl_dataset_try_addref()->dmu_buf_try_addref() succeeded on this bonus buffer.

Is the user the same dsl_dataset_t as in the callback?

> ffffff0277445090 ::print dnode_t dn_bonus | ::print dmu_buf_impl_t db_user
db_user = 0xffffff02d13aa9c0

Nope.

Is it a valid dsl_dataset_t?

> 0xffffff02d13aa9c0 ::print dsl_dataset_t
{
    ds_dbu = {
        dbu_tqent = {
            tqent_next = 0
            tqent_prev = 0
            tqent_func = 0
            tqent_arg = 0
            tqent_un = {
                tqent_bucket = 0
                tqent_flags = 0
            }
            tqent_thread = 0
            tqent_cv = {
                _opaque = 0
            }
        }
        dbu_evict_func = dsl_dataset_evict
    }
    ds_dir = 0xffffff02654dbac0
    ds_dbuf = 0xffffff0272b00c90
    ds_object = 0x369
    ds_fsid_guid = 0xfcb0034fdd43f2
    ds_is_snapshot = 0 (0)
    ds_prev = 0xffffff0353ff4380
...

Yup.

So we evicted one dsl_dataset_t for this dataset and almost immediately re-referenced the same dataset. This created a new dsl_dataset_t holding the same bonus buffer as the old one. dsl_prop_notify_all_cb() caught the bonus buffer in the "stale" dsl_dataset_t before it was cleared by dsl_dataset_evict() and got a valid hold, but this isn't enough to ensure a dsl_dataset_t is valid. We have to hold the bonus buffer and verify that the dbuf user is the dsl_dataset_t we're looking for, otherwise it could be being evicted.

#5

Updated by Justin Gibbs over 4 years ago

#6

Updated by Justin Gibbs over 4 years ago

  • Status changed from Feedback to Closed
  • % Done changed from 90 to 100

Fix for Issue #5592, which closed another related race, has been committed.

Also available in: Atom PDF