Project

General

Profile

Bug #6250

zvol_dump_init() can hold txg open

Added by Matthew Ahrens about 4 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Normal
Category:
zfs - Zettabyte File System
Start date:
2015-09-20
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Analysis by George Wilson:

While testing out my new project bits I hit a system hang while booting up. Looking at the running threads I see the following:
[6]> ffffff0d6394f8a0::findstack -v
stack pointer for thread ffffff0d6394f8a0: ffffff005ce52330
[ ffffff005ce52330 _resume_from_idle+0xf4() ]
ffffff005ce52360 swtch+0x141()
ffffff005ce52410 turnstile_block+0x262(0, 0, ffffffffc0000b78,
fffffffffbc07960, 0, 0)
ffffff005ce52480 mutex_vector_enter+0x3c5(ffffffffc0000b78)
ffffff005ce52520 zfs`spa_open_common+0x79(ffffff0d625cf6c0, ffffff005ce52568,
fffffffff7aca438, 0, 0)
ffffff005ce52550 zfs`spa_open+0x1e(ffffff0d625cf6c0, ffffff005ce52568,
fffffffff7aca438)
ffffff005ce525b0 zfs`dsl_pool_hold+0x32(ffffff0d625cf6c0, fffffffff7aca438,
ffffff005ce525d8)
ffffff005ce52620 zfs`dmu_objset_hold+0x32(ffffff0d625cf6c0, fffffffff7aca438,
ffffff005ce52648)
ffffff005ce526b0 zfs`dsl_prop_get+0x52(ffffff0d625cf6c0, fffffffff7aaa3aa, 8,
1, ffffff005ce52728, 0)
ffffff005ce526e0 zfs`dsl_prop_get_integer+0x2d(ffffff0d625cf6c0,
fffffffff7aaa3aa, ffffff005ce52728, 0)
ffffff005ce52780 zfs`zvol_dump_init+0x195(ffffff0d625cf6c0, 0)
ffffff005ce527e0 zfs`zvol_dumpify+0xa9(ffffff0d625cf6c0)
ffffff005ce528a0 zfs`zvol_ioctl+0x335(1c00000001, 41c, 0, 80000000,
ffffff0d2a48edb0, 0)
ffffff005ce52950 zfs`zfsdev_ioctl+0x478(1c00000001, 41c, 0, 80000000,
ffffff0d2a48edb0, 0)
ffffff005ce52990 cdev_ioctl+0x39(1c00000001, 41c, 0, 80000000,
ffffff0d2a48edb0, 0)
ffffff005ce529e0 specfs`spec_ioctl+0x60(ffffff0d63eb5e80, 41c, 0, 80000000,
ffffff0d2a48edb0, 0, 0)
ffffff005ce52a70 fop_ioctl+0x55(ffffff0d63eb5e80, 41c, 0, 80000000,
ffffff0d2a48edb0, 0, 0)
ffffff005ce52bd0 dumpinit+0x3e3(ffffff0d62480700, ffffff0d62f67e00, 0)
ffffff005ce52c80 dump`dump_ioctl+0x38a(5a00000000, dd14, 804753c, 100003,
ffffff0d6398ba20, ffffff005ce52e68)
ffffff005ce52cc0 cdev_ioctl+0x39(5a00000000, dd14, 804753c, 100003,
ffffff0d6398ba20, ffffff005ce52e68)
ffffff005ce52d10 specfs`spec_ioctl+0x60(ffffff0d625efb80, dd14, 804753c,
100003, ffffff0d6398ba20, ffffff005ce52e68, 0)
ffffff005ce52da0 fop_ioctl+0x55(ffffff0d625efb80, dd14, 804753c, 100003,
ffffff0d6398ba20, ffffff005ce52e68, 0)
ffffff005ce52ec0 ioctl+0x9b(3, dd14, 804753c)
ffffff005ce52f10 _sys_sysenter_post_swapgs+0x149()
[6]> ffffffffc0000b78::mutex
ADDR TYPE HELD MINSPL OLDSPL WAITERS
ffffffffc0000b78 adapt ffffff005d80cc40 - - yes
[6]> ffffff005d80cc40::findstack -v
stack pointer for thread ffffff005d80cc40: ffffff005d80ca30
[ ffffff005d80ca30 _resume_from_idle+0xf4() ]
ffffff005d80ca60 swtch+0x141()
ffffff005d80caa0 cv_wait+0x70(ffffff0d208918c2, ffffff0d20891880)
ffffff005d80cb00 zfs`txg_wait_synced+0xc3(ffffff0d20891700, a5fa)
ffffff005d80cb60 zfs`spa_config_update+0xf4(ffffff0d2cbda000, 1)
ffffff005d80cbc0 zfs`spa_config_update+0x112(ffffff0d2cbda000, 0)
ffffff005d80cc20 zfs`spa_async_thread+0x251(ffffff0d2cbda000)
ffffff005d80cc30 thread_start+8()
This thread is stuck waiting for zfs to sync out the next txg but it can't because the quiesce thread can't close the current txg:
ffffff005c70cc40 SLEEP CV 1
swtch+0x141
cv_wait+0x70
zfs`txg_quiesce+0x15b
zfs`txg_quiesce_thread+0xa9
thread_start+8
This leads to a deadlock preventing zfs from making forward progress.

In zvol_dump_init() we do the following:
tx = dmu_tx_create(os);
dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL);
dmu_tx_hold_bonus(tx, ZVOL_OBJ);
error = dmu_tx_assign(tx, TXG_WAIT);
if (error) { dmu_tx_abort(tx); return (error); }
And later we call dsl_prop_get_integer():
uint64_t checksum, compress, refresrv, vbs, dedup;
error = dsl_prop_get_integer(zv->zv_name,
zfs_prop_to_name(ZFS_PROP_COMPRESSION), &compress, NULL);
error = error ? error : dsl_prop_get_integer(zv->zv_name,
zfs_prop_to_name(ZFS_PROP_CHECKSUM), &checksum, NULL);
error = error ? error : dsl_prop_get_integer(zv->zv_name,
zfs_prop_to_name(ZFS_PROP_REFRESERVATION), &refresrv, NULL);
error = error ? error : dsl_prop_get_integer(zv->zv_name,
zfs_prop_to_name(ZFS_PROP_VOLBLOCKSIZE), &vbs, NULL);
if (version >= SPA_VERSION_DEDUP) { error = error ? error : dsl_prop_get_integer(zv->zv_name, zfs_prop_to_name(ZFS_PROP_DEDUP), &dedup, NULL); }
We should make the calls above before creating a new transaction so that we avoid trying to grab the spa_namespace_lock while we're in the middle of a transaction.

History

#1

Updated by Electric Monk almost 4 years ago

  • % Done changed from 0 to 100
  • Status changed from New to Closed

git commit b10bba72460aeaa53119c76ff5e647fd5585bece

commit  b10bba72460aeaa53119c76ff5e647fd5585bece
Author: George Wilson <george.wilson@delphix.com>
Date:   2015-09-23T22:02:34.000Z

    6250 zvol_dump_init() can hold txg open
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Prakash Surya <prakash.surya@delphix.com>
    Reviewed by: Albert Lee <trisk@omniti.com>
    Reviewed by: Xin Li <delphij@freebsd.org>
    Approved by: Garrett D'Amore <garrett@damore.org>

Also available in: Atom PDF