Feature #10210

Updated by Robert Mustacchi about 1 year ago

With the advent of the new Intel SHA instructions, we should look at what's required to add support for and leverage them in pkcs11 and in the kernel. This should allow consumers like libmd and co to benefit from them when they're being used.

To implement this, we pulled in Intel's work which adds hardware accelerated options for the following:

* SHA1 - SHA (sha hardware instructions)

* SHA256 - SHA

On AMD EPYC systems, they hardware SHA instructions provides a pretty impressive boost. On the other hand, the gains on the Intel side are more marginal. All of the Intel code was BSD licensed, which is why I ended up looking at it over the OpenSSL bits which can be a bit more complicated to leverage. I had originally also looked at pulling in an SSE3 and AVX implementation but found much more varied results.

Just an example of the kinds of improvements we see. This is a 10 GiB file of zeros on an AMD EPYC system:

[root@odyssey /var/tmp]# ptime /usr/bin/amd64/digest -a sha256 ./foo.10g

real 44.937446493
user 39.798585839
sys 5.135747171
[root@odyssey /var/tmp]# LD_PRELOAD_64=./ ptime /usr/bin/amd64/digest -a sha256 ./foo.10g

real 12.722944420
user 6.983171064
sys 5.736314290

To test this I ran the cyrpto test suite which goes through the userland and kernel SHA test vectors on both systems with the SHA instructions and those without. The crypto test suite was 100% clean in both cases.