Say you’re doing argument reduction for trig functions for double floats (wonder where this comes from…), then you should use different algorithms depending on the magnitude of your argument. Therefore, you end up with some branching and special cases depending on the required precision (say, around multiples of pi/2 or for numbers larger than 2.0^20).
Now, let’s add FMA to the mix. Due to the fact that FMAs are calculated with just one final rounding, we can in some sense calculate the above to higher precision for a given algorithm. However, given this increased precision, it no longer makes sense to switch to the more expensive versions of the algorithms near pi/2 multiples, and we can wait longer to switch to the expensive Payne Hanek scheme. The reduction scheme changes quite a lot.
In https://github.com/JuliaLang/julia/pull/22603 I do not use FMAs, but many systems have these instruction sets available these days, and it would be stupid not to take advantage of this. However, I am unsure how to best go about this. Do you check for the presence of FMA instructions when
makeing and include either of the two depending on a boolean flag, or?