Bug #12668

ZFS support for vectorized algorithms on x86 (initial support)

Added by Jerry Jelinek 11 months ago. Updated 10 months ago.

zfs - Zettabyte File System
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


This tracks preliminary work so that we can eventually add SIMD HW support to ZFS raidz.

We start with a port from openzfs for:
fc0c72b16 Support for vectorized algorithms on x86 (initial support)

and a partial port of:
ab9f4b0b824 SIMD implementation of vdev_raidz generate and reconstruct routines

This initial phase does not add any code which uses the FPU in the kernel. All that this
first phase is doing is porting the basic code refactoring so that we could (in the future)
add the SIMD modules which use FPU features such as SSE or AVX2. That work
will require some additional prerequisite work to allow FPU usage in the kernel.

At this time, there is no intention to port the openzfs code which uses other x86
SIMD features, such as any of the AVX512 support, since there is known overhead
associated with using AVX512.


fpu_bench.txt (14.2 KB) fpu_bench.txt Complete benchmark output Jerry Jelinek, 2020-05-29 05:13 PM

Related issues

Related to illumos gate - Bug #12794: ZFS support for vectorized algorithms on x86 (HW support)ClosedJerry Jelinek

Related to illumos gate - Bug #12968: curthread swtch-ing while the kernel is using the FPUClosedJerry Jelinek

Related to illumos gate - Bug #13100: zdb rpool crash on raidzNew


Updated by Jerry Jelinek 11 months ago

To test this I ran a zfs test run which includes running the new "raidz_test" program
to test the algorithms and compare the results to the original raidz algorithm.

I also ran this on a system where I had a copy of a set of files in a raidz2.
I first failed one disk (using zinject) and diff-verified the files against a
golden copy in a separate file system. I then repeated this test (after
clearing the ARC) with 2 failed disks. The files were correctly reconstructed
from parity in both cases.

For performance, the "raidz_test" program includes a benchmark mode. I have
attached a complete set of output. Since this code runs through all of the
algorithms and all of the permutations, there are a lot of results.

Here is a summary of some key comparisons at 4K, 128K and 1M recordsize.
In general the scalar algorithm is much better on generation, except for the
small recordsize (4k). The scalar algorithm is always the same or much better
on reconstruction.

In these results, the columns are the following and the 'total_bw' column
(2nd to last) provides the relevant comparison.

impl, math, dcols, iosize, disk_bw, total_bw, iter

raidz1 generation
  original,    gen_p, 8,       4096, 108.050526, 972.454730, 1048576
    scalar,    gen_p, 8,       4096, 106.141851, 955.276656, 1048576

  original,    gen_p, 8,     131072, 248.547621, 2236.928588, 32768
    scalar,    gen_p, 8,     131072, 355.345384, 3198.108457, 32768

  original,    gen_p, 8,    1048576,  478.934748, 4310.412734, 4096
    scalar,    gen_p, 8,    1048576, 1099.841124, 9898.570117, 4096

raidz2 generation
  original,   gen_pq, 8,       4096, 106.083159, 1060.831590, 1048576
    scalar,   gen_pq, 8,       4096,  53.017868,  530.178685, 1048576

  original,   gen_pq, 8,     131072, 191.733953, 1917.339534, 32768
    scalar,   gen_pq, 8,     131072, 291.722009, 2917.220091, 32768

  original,   gen_pq, 8,    1048576, 312.419193, 3124.191926, 4096
    scalar,   gen_pq, 8,    1048576, 455.338371, 4553.383707, 4096

raidz3 generation
  original,  gen_pqr, 8,       4096, 102.868121, 1131.549328, 1048576
    scalar,  gen_pqr, 8,       4096,  35.402424,  389.426664, 1048576

  original,  gen_pqr, 8,     131072, 103.808905, 1141.897953, 32768
    scalar,  gen_pqr, 8,     131072, 160.601484, 1766.616326, 32768

  original,  gen_pqr, 8,    1048576, 133.020122, 1463.221346, 4096
    scalar,  gen_pqr, 8,    1048576, 210.622504, 2316.847542, 4096


Updated by Electric Monk 10 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit e86372a01d2d16a5dd4a64e144ed978ba17fe7dd

commit  e86372a01d2d16a5dd4a64e144ed978ba17fe7dd
Author: Gvozden Neskovic <>
Date:   2020-06-01T21:05:58.000Z

    12668 ZFS support for vectorized algorithms on x86 (initial support)
    Portions contributed by: Jerry Jelinek <>
    Reviewed by: Brian Behlendorf <>
    Reviewed by: Jason King <>
    Approved by: Dan McDonald <>


Updated by Joshua M. Clulow 10 months ago

  • Related to Bug #12794: ZFS support for vectorized algorithms on x86 (HW support) added

Updated by Marcel Telka 9 months ago

  • Related to Bug #12968: curthread swtch-ing while the kernel is using the FPU added

Updated by Dan McDonald about 2 months ago

  • Related to Bug #13100: zdb rpool crash on raidz added

Also available in: Atom PDF