Did you use @avx
with SLEEFPirates? It will not reliably SIMD without it.
EDIT:
Also, FWIW, the relative error in the example we provided is:
julia> tanh(0.0001)
9.999999966666668e-5
julia> (SLEEFPirates.tanh_fast(0.0001) - ans)/ans
2.8135046469782325e-13
Ideally, we want to be within a few units in last place (ulp). I.e., prevfload(x, n)
should get you the exact answer with abs(n) <= 4
or so.