Actions
Bug #4046
closeddsl_dataset_t ds_dir->dd_lock is highly contended
Start date:
2013-08-16
Due date:
% Done:
100%
Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:
External Bug:
Description
Analysis by Adam Leventhal and Matt Ahrens:
Exploring a customer system, I saw that this lock was heavily contended: Count indv cuml rcnt nsec Lock Caller 97689 7% 35% 0.00 109408 0xffffff0d7b906c50 dsl_dataset_block_born+0xd3 nsec ------ Time Distribution ------ count Stack 512 | 33 dbuf_write_done+0x86 1024 | 74 arc_write_done+0x261 2048 | 45 zio_done+0x48b 4096 | 1103 zio_execute+0x8d 8192 |@@@ 11801 zio_notify_parent+0xa6 16384 |@@@@@@@@@@@@@@@@@@@@@ 71561 zio_done+0x4ea 32768 |@@ 7791 zio_execute+0x8d 65536 | 1025 zio_notify_parent+0xa6 131072 | 419 zio_done+0x4ea 262144 | 333 zio_execute+0x8d 524288 | 434 taskq_thread+0x285 1048576 | 618 thread_start+0x8 2097152 | 814 4194304 | 1063 8388608 | 511 16777216 | 32 33554432 | 17 67108864 | 12 134217728 | 3 Note that we're spinning for 10s and 100s of MILLISECONDS. That corresponds to this code: 112 mutex_enter(&ds->ds_dir->dd_lock); 113 mutex_enter(&ds->ds_lock); 114 delta = parent_delta(ds, used); 115 ds->ds_phys->ds_referenced_bytes += used; 116 ds->ds_phys->ds_compressed_bytes += compressed; 117 ds->ds_phys->ds_uncompressed_bytes += uncompressed; 118 ds->ds_phys->ds_unique_bytes += used; 119 mutex_exit(&ds->ds_lock); 120 dsl_dir_diduse_space(ds->ds_dir, DD_USED_HEAD, delta, 121 compressed, uncompressed, tx); 122 dsl_dir_transfer_space(ds->ds_dir, used - delta, 123 DD_USED_REFRSRV, DD_USED_HEAD, tx); 124 mutex_exit(&ds->ds_dir->dd_lock); Fixing this could remove a measurable performance bottleneck.
The fix is to not hold the dd_lock for as long. I determined that it was not necessary to hold it while manipulating the parent's accounting, because there is no interface for userland to see a consistent picture of both parent and child anyway. Each dsl_dir's accounting will be self-consistent. (note that this change doesn't affect on-disk consistency, which is always consistent across the entire pool) I reproduced this by writing a at a high rate of (logical) blocks/second, with 8 vCPUs. lockstat shows this lock heavily contended via this stack trace. With the fix it doesn't show up.
Updated by Christopher Siden about 10 years ago
- Status changed from In Progress to Closed
commit b62969f868a827f0823a084bc0af9c7d8b76c659 Author: Matthew Ahrens <mahrens@delphix.com> Date: Thu Aug 22 10:51:47 2013 4046 dsl_dataset_t ds_dir->dd_lock is highly contended Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org>
Actions