Project

General

Profile

Actions

Bug #7004

closed

dmu_tx_hold_zap() does dnode_hold() 7x on same object

Added by Matthew Ahrens about 5 years ago. Updated almost 5 years ago.

Status:
Closed
Priority:
Normal
Category:
zfs - Zettabyte File System
Start date:
2016-05-27
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

Using a benchmark which has 32 threads creating 2 million files in the same directory, on a machine with 16 CPU cores, and with a workaround for the object allocation issue described at: https://github.com/zfsonlinux/zfs/issues/4636, I observed poor performance (~30,000 file creations per second). I noticed that `dmu_tx_hold_zap()` was using about 30% of all CPU, and doing `dnode_hold()` 7 times on the same object (the ZAP object that is being held).

`dmu_tx_hold_zap()` keeps a hold on the `dnode_t` the entire time it is running, in `dmu_tx_hold_t:txh_dnode`, so it would be nice to use the `dnode_t` that we already have in hand, rather than repeatedly calling `dnode_hold()`. To do this, we need to pass the `dnode_t` down through all the intermediate calls that `dmu_tx_hold_zap()` makes, making these routines take the `dnode_t*` rather than an `objset_t*` and a `uint64_t object` number. In particular, the following routines will need to have analogous `*_by_dnode()` variants created:

  • `dmu_buf_hold_noread()`
  • `dmu_buf_hold()`
  • `zap_lookup()`
  • `zap_lookup_norm()`
  • `zap_count_write()`
  • `zap_lockdir()`
  • `zap_count_write()`

This can improve performance on the benchmark described above by 100%, from 30,000 file creations per second to 60,000. (This improvement is on top of that provided by working around the object allocation issue. Peak performance of ~90,000 creations per second was observed with 8 CPUs; adding CPUs past that decreased performance due to lock contention.) The CPU used by `dmu_tx_hold_zap()` was reduced by 88%, from 340 CPU-seconds to 40 CPU-seconds.

Actions

Also available in: Atom PDF