Project

General

Profile

Actions

Feature #14663

open

Add per-dataset kstats

Added by Jason King 29 days ago. Updated 25 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
zfs - Zettabyte File System
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Mostly a port of OpenZFS #7705:

Introduce read/write kstats per dataset

The following patch introduces a few statistics on reads and writes
grouped by dataset. These statistics are implemented as kstats
(backed by aggregate sums for performance) and can be retrieved by
using the dataset objset ID number. The motivation for this change is
to provide some preliminary analytics on dataset usage/performance.

However, while OpenZFS ported enough of the kstat interface to port the kstat code to other platforms, it also expanded the size of KSTAT_STRLEN to 255. Due to binary compatibility concerns, we cannot include that change. This means that the per-dataset kstats will look slightly different -- we cannot use the pool or dataset name in the name of the kstat or the module (without risk of truncation).

Instead, each pool will have it's own 'module' of the form zpool-GUID where 'GUID' is the hex guid of the pool (e.g. zpool-52422b45fcdef452). Similarly, the name of the kstat for a dataset will be 'objset-OBJID' (e.g. 'objset-15'). To make the mapping of a kstat back to the pool and the dataset, aside from the named kstats from the OpenZFS change reads, nread, writes, nwrite, we also include two string kstats that contain the name of the pool and dataset.
This does mean we won't match exactly what other OpenZFS ports do here, but there doesn't seem to be a better way without breaking binary compatibility with kstats.


Related issues

Related to illumos gate - Bug #6090: IOPS, bandwidth, and latency kstats for NFS serverRejected2015-07-29

Actions
Actions #1

Updated by Electric Monk 29 days ago

  • Gerrit CR set to 2131
Actions #2

Updated by Jason King 29 days ago

For testing, I used dd to create a file to an empty dataset and viewed the corresponding kstat before and after the write. The nwritten and writes kstat incremented the expected amount.

I then used dd to read the same file and observed the kstat again and verified the nread and reads kstat was incremented the expected amount (and that the nwritten and writes kstat did not change).

Actions #3

Updated by Marcel Telka 29 days ago

Please be aware that with many zfs datasets and per-dataset kstats you could hit performance issue in the kstat subsystem (kstat is slow with many kstats), similarly as I did with NFS server (#6090). OTOH, since you create single kstat per dataset only (excluding snapshots) the threshold when there will be performance impact noticeable should be much higher than for #6090.

Actions #4

Updated by Marcel Telka 29 days ago

  • Related to Bug #6090: IOPS, bandwidth, and latency kstats for NFS server added
Actions #5

Updated by Toomas Soome 26 days ago

Marcel Telka wrote in #note-3:

Please be aware that with many zfs datasets and per-dataset kstats you could hit performance issue in the kstat subsystem (kstat is slow with many kstats), similarly as I did with NFS server (#6090). OTOH, since you create single kstat per dataset only (excluding snapshots) the threshold when there will be performance impact noticeable should be much higher than for #6090.

Is there more analysis done about this bottleneck and how could we fix it?

Actions #6

Updated by Marcel Telka 25 days ago

Toomas Soome wrote in #note-5:

Marcel Telka wrote in #note-3:

Please be aware that with many zfs datasets and per-dataset kstats you could hit performance issue in the kstat subsystem (kstat is slow with many kstats), similarly as I did with NFS server (#6090). OTOH, since you create single kstat per dataset only (excluding snapshots) the threshold when there will be performance impact noticeable should be much higher than for #6090.

Is there more analysis done about this bottleneck and how could we fix it?

Sorry, I do not remember more details, but I believe the problem is easily reproducible when there are a lot of kstats and you try to work with them using the kstat(1m) command.

Actions

Also available in: Atom PDF