Project

General

Profile

Actions

Bug #12794

closed

ZFS support for vectorized algorithms on x86 (HW support)

Added by Jerry Jelinek almost 3 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
zfs - Zettabyte File System
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

This is the second phase of the work to port the ZFS HW-accelerated raidz
calculation code from OpenZFS. This work depends on tickets 12668 and 12793.


Files

fpu_bench_new.txt (35.6 KB) fpu_bench_new.txt Jerry Jelinek, 2020-07-09 02:34 PM

Related issues

Related to illumos gate - Bug #12668: ZFS support for vectorized algorithms on x86 (initial support)ClosedJerry Jelinek

Actions
Related to illumos gate - Bug #12793: kernel FPU supportClosedJerry Jelinek

Actions
Related to illumos gate - Bug #12968: curthread swtch-ing while the kernel is using the FPUClosedJerry Jelinek

Actions
Actions #1

Updated by Electric Monk over 2 years ago

  • Gerrit CR set to 716
Actions #2

Updated by Joshua M. Clulow over 2 years ago

  • Related to Bug #12668: ZFS support for vectorized algorithms on x86 (initial support) added
Actions #3

Updated by Joshua M. Clulow over 2 years ago

Actions #4

Updated by Jerry Jelinek over 2 years ago

I ran a variety of different tests.

1) zfs test suite - this includes the raidz tests added in 12668 which test all of the algorithms in the code. The new sse2, ssse3 and avx2 algorithms are now tested.

2) I ran the raidz exercising subset of the zfs test suite tests in a continuous loop for 24 hours as a stress test.

3) I ran a heavy fio write load onto a raidz to force kernel FPU usage for parity generation while simultaneously running the raidz_test generation code. This code will generates a heavy user-land FPU load. I ran these two simultaneously for 1/2 hour and verified that that was no failures in any of the user-level tests. The goal is to test that concurrent kernel/user-level FPU usage is safe.

4) I ran this on a system where I had a copy of a set of files in a raidz2. I first failed one disk (using zinject) and diff-verified the files against a golden copy in a separate file system. I then repeated this test (after clearing the ARC) with 2 failed disks. The files were correctly reconstructed from parity in both cases.

5) For performance testing, the "raidz_test" program includes a benchmark mode. I have attached a complete set of output. Since this code runs through all of the algorithms and all of the permutations, there are a lot of results. This is a repeat of the performance testing from 12668, but now includes the FPU algorithms.

I ran this on a machine which has integrated FPU support for all 3 of the new HW algorithms.

Here is a summary of some key comparisons at 4K, 128K and 1M recordsize. In general all of the FPU algorithms are much better on generation, except for the small recordsize (4k) on raidz2/3. All of the FPU algorithms are always the same or much better on reconstruction.

In these results, the columns are the following and the 'total_bw' column (2nd to last) provides the relevant comparison.

impl, math, dcols, iosize, disk_bw, total_bw, iter

raidz1 generation
  original,    gen_p, 8,       4096, 108.577368, 977.196313, 1048576
    scalar,    gen_p, 8,       4096, 105.701704, 951.315332, 1048576
      sse2,    gen_p, 8,       4096, 107.799976, 970.199786, 1048576
     ssse3,    gen_p, 8,       4096, 107.669024, 969.021218, 1048576
      avx2,    gen_p, 8,       4096, 108.621007, 977.589063, 1048576

  original,    gen_p, 8,     131072, 248.801670, 2239.215031, 32768
    scalar,    gen_p, 8,     131072, 354.875480, 3193.879323, 32768
      sse2,    gen_p, 8,     131072, 377.592469, 3398.332219, 32768
     ssse3,    gen_p, 8,     131072, 378.103709, 3402.933384, 32768
      avx2,    gen_p, 8,     131072, 404.689481, 3642.205326, 32768

  original,    gen_p, 8,    1048576,  475.440396,  4278.963560, 4096
    scalar,    gen_p, 8,    1048576, 1091.816247,  9826.346221, 4096
      sse2,    gen_p, 8,    1048576, 1353.059128, 12177.532150, 4096
     ssse3,    gen_p, 8,    1048576, 1366.004388, 12294.039488, 4096
      avx2,    gen_p, 8,    1048576, 1534.082664, 13806.743972, 4096

raidz2
  original,   gen_pq, 8,       4096, 106.136765, 1061.367650, 1048576
    scalar,   gen_pq, 8,       4096,  53.154873,  531.548732, 1048576
      sse2,   gen_pq, 8,       4096,  53.959130,  539.591303, 1048576
     ssse3,   gen_pq, 8,       4096,  54.129877,  541.298770, 1048576
      avx2,   gen_pq, 8,       4096,  54.395867,  543.958667, 1048576

  original,   gen_pq, 8,     131072, 195.754027, 1957.540271, 32768
    scalar,   gen_pq, 8,     131072, 293.189424, 2931.894238, 32768
      sse2,   gen_pq, 8,     131072, 459.507476, 4595.074761, 32768
     ssse3,   gen_pq, 8,     131072, 459.897220, 4598.972202, 32768
      avx2,   gen_pq, 8,     131072, 530.460067, 5304.600671, 32768

  original,   gen_pq, 8,    1048576,  312.232121,  3122.321213, 4096
    scalar,   gen_pq, 8,    1048576,  458.227846,  4582.278458, 4096
      sse2,   gen_pq, 8,    1048576, 1028.868958, 10288.689577, 4096
     ssse3,   gen_pq, 8,    1048576, 1025.170281, 10251.702811, 4096
      avx2,   gen_pq, 8,    1048576, 1397.466808, 13974.668081, 4096

raidz3
  original,  gen_pqr, 8,       4096, 103.737739, 1141.115133, 1048576
    scalar,  gen_pqr, 8,       4096,  35.388557,  389.274123, 1048576
      sse2,  gen_pqr, 8,       4096,  36.030771,  396.338479, 1048576
     ssse3,  gen_pqr, 8,       4096,  36.085041,  396.935455, 1048576
      avx2,  gen_pqr, 8,       4096,  36.302696,  399.329651, 1048576

  original,  gen_pqr, 8,     131072, 105.744226, 1163.186483, 32768
    scalar,  gen_pqr, 8,     131072, 160.649974, 1767.149711, 32768
      sse2,  gen_pqr, 8,     131072, 320.647502, 3527.122526, 32768
     ssse3,  gen_pqr, 8,     131072, 318.813970, 3506.953670, 32768
      avx2,  gen_pqr, 8,     131072, 399.040193, 4389.442126, 32768

  original,  gen_pqr, 8,    1048576, 132.985046,  1462.835507, 4096
    scalar,  gen_pqr, 8,    1048576, 210.713554,  2317.849091, 4096
      sse2,  gen_pqr, 8,    1048576, 596.334562,  6559.680178, 4096
     ssse3,  gen_pqr, 8,    1048576, 600.951422,  6610.465643, 4096
      avx2,  gen_pqr, 8,    1048576, 953.593970, 10489.533666, 4096
Actions #6

Updated by Electric Monk over 2 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit f91a454727d8e1cd4bbbe2d4efd2754590298697

commit  f91a454727d8e1cd4bbbe2d4efd2754590298697
Author: Gvozden Neskovic <neskovic@gmail.com>
Date:   2020-07-15T12:43:07.000Z

    12794 ZFS support for vectorized algorithms on x86 (HW support)
    Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Patrick Mooney <pmooney@pfmooney.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Robert Mustacchi <rm@fingolfin.org>

Actions #7

Updated by Marcel Telka over 2 years ago

  • Related to Bug #12968: curthread swtch-ing while the kernel is using the FPU added
Actions

Also available in: Atom PDF