disk sync write perf regression when slog is used post oi_148
Initially I discovered this while testing VMWare ESXi 4.1u2 on latest Illumos bits. I found out that the write performance was almost halved.
My initial guess was that TCP/IP or NFS was the culprit. But various tests including an iperf version for ESXi showed that wirespeed could be achieved. A second hardware platform, AMD instead of XEON, showed the same regression. Apart from other similarities both platforms were using a dedicated ddr ram based device connected via onboard sata. And as soon as I removed the slog from the zpool sync performance was the same between oi148,oi151a and illumos nightly. Albeit somewhat low 5MB/s, but that could be expected from a two disk pool. As soon as the slog was added back to the pool oi148 sync writes jumped to 50MB/s but illumos only increased to 13MB/s. Showing a strange pattern in zilstat. The async performance remained the same on all versions with or without slog.
- dd if=/dev/zero of=/mpool/4g.bin [oflag=sync] bs=32K count=128K
This more or less simulates the esxi nfs writes sync and async. I tried various block sizes but that didn’t made much of a difference
For example a 2 disk pool with and without slog
oi148 with slog async dd=218MB/s
oi148 with slog sync dd=50MB/s zilstat iops/per txg=7500
oi148 NO slog async dd=216MB/s
oi148 NO slog sync dd=5MB/s zilstat iops/per txg=750
illumos with slog async dd=223MB/s
illumos with slog sync dd=13MB/s zilstat iops/per txg= first=4400 second=675 third=750 this pattern repeats
illumos NO slog async dd=221MB/s
illumos NO slog sync dd=5MB/s zilstat iops/per txg=770
Here's a bit of data regarding the slog device standalone
illumos spool async dd=148MB/s
illumos spool sync dd=53.7MB/s (the device is using itself for zil)
Not lightning fast but with low latency, no wear, and sufficient for gbit ethernet
The regression is introduced after oi148 but before oi151a. At least that limits the set of commits to analyze.
More data can be provided on request.
Updated by George Wilson over 8 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 70
The issue is that ZIL blocks should always be contiguous blocks and thus the "fast gang" logic should not apply to those allocations. I have generated a patch and am currently running some tests.
A simple workaround is to do the following:
- echo "zfs_mg_alloc_failures/W 0t10000" | mdb -kw
Updated by Eric Schrock about 8 years ago
- Status changed from In Progress to Resolved
user: George Wilson <email@example.com>
date: Mon Jan 23 18:47:28 2012 -0800
1909 disk sync write perf regression when slog is used post oi_148
Reviewed by: Matt Ahrens <firstname.lastname@example.org>
Reviewed by: Eric Schrock <email@example.com>
Reviewed by: Robert Mustacchi <firstname.lastname@example.org>
Reviewed by: Bill Pijewski <email@example.com>
Reviewed by: Richard Elling <firstname.lastname@example.org>
Reviewed by: Steve Gonczi <email@example.com>
Reviewed by: Garrett D'Amore <firstname.lastname@example.org>
Reviewed by: Dan McDonald <email@example.com>
Reviewed by: Albert Lee <firstname.lastname@example.org>
Approved by: Eric Schrock <email@example.com>