Project

General

Profile

Bug #5339

zfs slog duplicate writes

Added by Andrew Kinney over 5 years ago. Updated over 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
2014-11-14
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

OS:
OmniOS v11 r151012 (omnios-10b9c79)

Hardware:
CPUs: dual Intel Xeon E5-2620v2 (hexa-core 2.1GHz)
RAM: 64GiB 1600MHz ECC Reg DDR3
SSDs: Samsung 845DC Pro 400GB connected to Intel C602J SATA3 (6Gbps)
HDDs: HGST HUS724040ALS640 (4TB SAS 7200 RPM) connected to LSI SAS 2308 via LSI SAS2 (6Gbps) expander backplane with 4x SAS2 controller links
network: not in the path for this testing, but Intel I350

The system is completely empty and unused besides this testing.

steps to reproduce:

1. create the simplest test pool with one vdev and one slog (demonstrative purpose only):
zpool create -f testpool c1t5000CCA05C68D505d0 log c5t0d0

2. using your freshly created, empty, and inactive pool, create a randomized file for testing:
openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt < /dev/zero | dd iflag=fullblock of=/testpool/randomfile.deleteme bs=1M count=4096

3. warm up the ARC for fast reads on your file:
dd if=/testpool/randomfile.deleteme of=/dev/null bs=1M

4. write 100MiB synchronously and observe slog "alloc" with 'zpool iostat -v':
dd if=/testpool/randomfile.deleteme of=/testpool/newrandom.deleteme bs=1M oflag=sync count=100 ; zpool iostat -v testpool 1 7

before running any additional tests, run:
rm -f /testpool/newrandom*

expected results:
slog 'alloc' peak at 100MiB

actual results
slog 'alloc' peak at 200MiB

I'm not sure if this "double data written to the slog" bug is in 'zpool iostat -v' (misreporting?), the pool sync/zil write related code, or the user (maybe this behaviour is intended for some reason I don't know).

If it is actually writing the data to the slog 2x, this has obvious performance and durability implications. Our observations about performance lead us to believe that it is actually writing twice the data to the slog.

raw test data:

root@storage0:/root# dd if=/testpool/randomfile.deleteme of=/testpool/newrandom.deleteme bs=1M oflag=sync count=100 ; zpool iostat -v testpool 1 7
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.335958 s, 312 MB/s
                            capacity     operations    bandwidth
pool                     alloc   free   read  write   read  write
-----------------------  -----  -----  -----  -----  -----  -----
testpool                 4.10G  3.62T      0     38     64  4.52M
  c1t5000CCA05C68D505d0  4.10G  3.62T      0     34     58  4.29M
logs                         -      -      -      -      -      -
  c5t0d0                  128K   372G      0      3      5   227K
-----------------------  -----  -----  -----  -----  -----  -----

                            capacity     operations    bandwidth
pool                     alloc   free   read  write   read  write
-----------------------  -----  -----  -----  -----  -----  -----
testpool                 4.16G  3.62T      0    324      0  35.0M
  c1t5000CCA05C68D505d0  4.16G  3.62T      0    324      0  35.0M
logs                         -      -      -      -      -      -
  c5t0d0                  126M   372G      0      0      0      0
-----------------------  -----  -----  -----  -----  -----  -----

                            capacity     operations    bandwidth
pool                     alloc   free   read  write   read  write
-----------------------  -----  -----  -----  -----  -----  -----
testpool                 4.16G  3.62T      0      0      0      0
  c1t5000CCA05C68D505d0  4.16G  3.62T      0      0      0      0
logs                         -      -      -      -      -      -
  c5t0d0                  126M   372G      0      0      0      0
-----------------------  -----  -----  -----  -----  -----  -----

                            capacity     operations    bandwidth
pool                     alloc   free   read  write   read  write
-----------------------  -----  -----  -----  -----  -----  -----
testpool                 4.16G  3.62T      0      0      0      0
  c1t5000CCA05C68D505d0  4.16G  3.62T      0      0      0      0
logs                         -      -      -      -      -      -
  c5t0d0                  126M   372G      0      0      0      0
-----------------------  -----  -----  -----  -----  -----  -----

                            capacity     operations    bandwidth
pool                     alloc   free   read  write   read  write
-----------------------  -----  -----  -----  -----  -----  -----
testpool                 4.16G  3.62T      0      0      0      0
  c1t5000CCA05C68D505d0  4.16G  3.62T      0      0      0      0
logs                         -      -      -      -      -      -
  c5t0d0                  126M   372G      0      0      0      0
-----------------------  -----  -----  -----  -----  -----  -----

                            capacity     operations    bandwidth
pool                     alloc   free   read  write   read  write
-----------------------  -----  -----  -----  -----  -----  -----
testpool                 4.16G  3.62T      0    301      0  35.9M
  c1t5000CCA05C68D505d0  4.16G  3.62T      0    301      0  35.9M
logs                         -      -      -      -      -      -
  c5t0d0                  126M   372G      0      0      0      0
-----------------------  -----  -----  -----  -----  -----  -----

                            capacity     operations    bandwidth
pool                     alloc   free   read  write   read  write
-----------------------  -----  -----  -----  -----  -----  -----
testpool                 4.20G  3.62T      0     22      0  75.8K
  c1t5000CCA05C68D505d0  4.20G  3.62T      0     22      0  75.8K
logs                         -      -      -      -      -      -
  c5t0d0                  200M   372G      0      0      0      0
-----------------------  -----  -----  -----  -----  -----  -----

root@storage0:/root#

History

#1

Updated by Andrew Kinney over 5 years ago

This issue appears to be most prevalent with low concurrency loads. Under high concurrency loads, the allocations are closer to actual data written.

Also available in: Atom PDF