Bug #5339
openzfs slog duplicate writes
0%
Description
OS:
OmniOS v11 r151012 (omnios-10b9c79)
Hardware:
CPUs: dual Intel Xeon E5-2620v2 (hexa-core 2.1GHz)
RAM: 64GiB 1600MHz ECC Reg DDR3
SSDs: Samsung 845DC Pro 400GB connected to Intel C602J SATA3 (6Gbps)
HDDs: HGST HUS724040ALS640 (4TB SAS 7200 RPM) connected to LSI SAS 2308 via LSI SAS2 (6Gbps) expander backplane with 4x SAS2 controller links
network: not in the path for this testing, but Intel I350
The system is completely empty and unused besides this testing.
steps to reproduce:
1. create the simplest test pool with one vdev and one slog (demonstrative purpose only):zpool create -f testpool c1t5000CCA05C68D505d0 log c5t0d0
2. using your freshly created, empty, and inactive pool, create a randomized file for testing:openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt < /dev/zero | dd iflag=fullblock of=/testpool/randomfile.deleteme bs=1M count=4096
3. warm up the ARC for fast reads on your file:dd if=/testpool/randomfile.deleteme of=/dev/null bs=1M
4. write 100MiB synchronously and observe slog "alloc" with 'zpool iostat -v':dd if=/testpool/randomfile.deleteme of=/testpool/newrandom.deleteme bs=1M oflag=sync count=100 ; zpool iostat -v testpool 1 7
before running any additional tests, run:rm -f /testpool/newrandom*
expected results:
slog 'alloc' peak at 100MiB
actual results
slog 'alloc' peak at 200MiB
I'm not sure if this "double data written to the slog" bug is in 'zpool iostat -v' (misreporting?), the pool sync/zil write related code, or the user (maybe this behaviour is intended for some reason I don't know).
If it is actually writing the data to the slog 2x, this has obvious performance and durability implications. Our observations about performance lead us to believe that it is actually writing twice the data to the slog.
raw test data:
root@storage0:/root# dd if=/testpool/randomfile.deleteme of=/testpool/newrandom.deleteme bs=1M oflag=sync count=100 ; zpool iostat -v testpool 1 7 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 0.335958 s, 312 MB/s capacity operations bandwidth pool alloc free read write read write ----------------------- ----- ----- ----- ----- ----- ----- testpool 4.10G 3.62T 0 38 64 4.52M c1t5000CCA05C68D505d0 4.10G 3.62T 0 34 58 4.29M logs - - - - - - c5t0d0 128K 372G 0 3 5 227K ----------------------- ----- ----- ----- ----- ----- ----- capacity operations bandwidth pool alloc free read write read write ----------------------- ----- ----- ----- ----- ----- ----- testpool 4.16G 3.62T 0 324 0 35.0M c1t5000CCA05C68D505d0 4.16G 3.62T 0 324 0 35.0M logs - - - - - - c5t0d0 126M 372G 0 0 0 0 ----------------------- ----- ----- ----- ----- ----- ----- capacity operations bandwidth pool alloc free read write read write ----------------------- ----- ----- ----- ----- ----- ----- testpool 4.16G 3.62T 0 0 0 0 c1t5000CCA05C68D505d0 4.16G 3.62T 0 0 0 0 logs - - - - - - c5t0d0 126M 372G 0 0 0 0 ----------------------- ----- ----- ----- ----- ----- ----- capacity operations bandwidth pool alloc free read write read write ----------------------- ----- ----- ----- ----- ----- ----- testpool 4.16G 3.62T 0 0 0 0 c1t5000CCA05C68D505d0 4.16G 3.62T 0 0 0 0 logs - - - - - - c5t0d0 126M 372G 0 0 0 0 ----------------------- ----- ----- ----- ----- ----- ----- capacity operations bandwidth pool alloc free read write read write ----------------------- ----- ----- ----- ----- ----- ----- testpool 4.16G 3.62T 0 0 0 0 c1t5000CCA05C68D505d0 4.16G 3.62T 0 0 0 0 logs - - - - - - c5t0d0 126M 372G 0 0 0 0 ----------------------- ----- ----- ----- ----- ----- ----- capacity operations bandwidth pool alloc free read write read write ----------------------- ----- ----- ----- ----- ----- ----- testpool 4.16G 3.62T 0 301 0 35.9M c1t5000CCA05C68D505d0 4.16G 3.62T 0 301 0 35.9M logs - - - - - - c5t0d0 126M 372G 0 0 0 0 ----------------------- ----- ----- ----- ----- ----- ----- capacity operations bandwidth pool alloc free read write read write ----------------------- ----- ----- ----- ----- ----- ----- testpool 4.20G 3.62T 0 22 0 75.8K c1t5000CCA05C68D505d0 4.20G 3.62T 0 22 0 75.8K logs - - - - - - c5t0d0 200M 372G 0 0 0 0 ----------------------- ----- ----- ----- ----- ----- ----- root@storage0:/root#
Updated by Andrew Kinney almost 9 years ago
This issue appears to be most prevalent with low concurrency loads. Under high concurrency loads, the allocations are closer to actual data written.