libmd should leverage SHA extensions
With the advent of the new Intel SHA instructions, we should look at what's required to add support for and leverage them in pkcs11 and in the kernel. This should allow consumers like libmd and co to benefit from them when they're being used.
To implement this, we pulled in Intel's work which adds hardware accelerated options for the following:
- SHA1 - SHA (sha hardware instructions)
- SHA256 - SHA
On AMD EPYC systems, they hardware SHA instructions provides a pretty impressive boost. On the other hand, the gains on the Intel side are more marginal. All of the Intel code was BSD licensed, which is why I ended up looking at it over the OpenSSL bits which can be a bit more complicated to leverage. I had originally also looked at pulling in an SSE3 and AVX implementation but found much more varied results.
Just an example of the kinds of improvements we see. This is a 10 GiB file of zeros on an AMD EPYC system:
[root@odyssey /var/tmp]# ptime /usr/bin/amd64/digest -a sha256 ./foo.10g 732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d real 44.937446493 user 39.798585839 sys 5.135747171 [root@odyssey /var/tmp]# LD_PRELOAD_64=./libmd.so.1 ptime /usr/bin/amd64/digest -a sha256 ./foo.10g 732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d real 12.722944420 user 6.983171064 sys 5.736314290
To test this I ran the cyrpto test suite which goes through the userland and kernel SHA test vectors on both systems with the SHA instructions and those without. The crypto test suite was 100% clean in both cases.