I mean it’s only an approximation and if you compare performance:
using BenchmarkTools
function fast_sqrt(y::Float32)
x2 = y * 0.5f0
i = reinterpret(Int32, y)
i = 0x5f3759df - (i >> 1)
y = reinterpret(Float32, i)
y = y * (1.5f0 - x2 * y^2)
return y
end
function compare_inv_sqrt(n=100000)
arr = abs.(10000 .* randn(Float32, n))
@btime fast_sqrt.($arr)
@btime 1 ./ sqrt.($arr)
@btime @fastmath 1 ./ sqrt.($arr)
return nothing
end
## REPL
julia> compare_inv_sqrt()
23.092 μs (2 allocations: 390.70 KiB)
154.720 μs (2 allocations: 390.70 KiB)
36.811 μs (2 allocations: 390.70 KiB)
Performance is only like ~40% better.