rtld should conditionally save AVX-512 state
One of the challenges with the AVX-512 state is that using AVX-512 registers can lead to a known decrease in performance of surrounding code by causing some of the logical cores to downclock while executing AVX-512 related instructions. We've also seen related cases where the processor can be tricked into thinking all FPU activity is AVX-512 related because the AVX-512 register state is considered valid.
One such case where this occurs is in rtld. Currently, rtld will attempt to save and restore the FPU registers when transitioning through the PLT. This is done such that constructors and other code that might be loaded as side effects don't clobber the state of the FPU registers in unexpected ways. This is expected effectively by the amd64 ABI where the first set of registers are callee saved.
What it appears that we need to do instead is to look at is not saving and restoring larger register sizes if they're not in use. For example, the xgetbv instruction when ecx is set to one allows for us to determine which register sets are in use by the process and therefore not save and restore all of them, but rather only save the relevant portion of the register space.
There is an alternative approach that Intel has taken in Linux in recent glibc. There, what they do is actually use the fnsave/xsave/xsavec, etc. to try and get at the information. We've opted to use this approach which will make maintenance here a lot simpler.