I am trying to understand LinearAlgebra.__normalize!
, quoted in full here:
@inline function __normalize!(a::AbstractArray, nrm)
# The largest positive floating point number whose inverse is less than infinity
δ = inv(prevfloat(typemax(nrm)))
if nrm ≥ δ # Safe to multiply with inverse
invnrm = inv(nrm)
rmul!(a, invnrm)
else # scale elements to avoid overflow
εδ = eps(one(nrm))/δ
rmul!(a, εδ)
rmul!(a, inv(nrm*εδ))
end
return a
end
Specifically, what is the two-step branch protecting against?
An example of how it fails without it would be useful too.
A simple example is [nextfloat(0.0)]
. In short, there are floating point numbers greater than zero whose inverse is Inf
. Multiplying by \frac{εδ}{εδ} can avoid it.
julia> a = [nextfloat(0.0)]
1-element Vector{Float64}:
5.0e-324
julia> nrm =norm(a)
5.0e-324
julia> a*inv(nrm)
1-element Vector{Float64}:
Inf
julia> δ = inv(prevfloat(typemax(nrm)))
5.562684646268003e-309
julia> εδ = eps(one(nrm))/δ
3.99168061906944e292
julia> a*εδ
1-element Vector{Float64}:
1.9721522630525295e-31
julia> ans * inv(nrm*εδ)
1-element Vector{Float64}:
1.0
2 Likes
Any idea why specifically that factor (which is basically eps(floatmax(nrm))
)? Seems like floatmin(nrm) / nextfloat(zero(nrm))
is a more intuitive and less extreme value that would do the trick. The point, as I understand it, is just to lift all subnormals into the normals.
1 Like
Thanks for the detailed explanation, now I get it.
On a related note: my understanding was that performing float division as multiplication by the inverse is faster, but less accurate. I was curious how much less accurate it is, so I tested it with random values like
using LinearAlgebra, Statistics
function collect_discrepancy(::Type{T}, N, K) where T
function _f(x)
z1 = normalize(x, 2) # __normalize! under the hook
z2 = x ./ norm(x, 2) # plain vanilla rdiv
xb = BigFloat.(x) # compare to “exact” result
zb = T.(xb ./ norm(xb, 2))
e1 = maximum(abs, z1 .- zb)
e2 = maximum(abs, z2 .- zb)
(; x, e1, e2)
end
[_f(randn(K)) for _ in 1:N]
end
and the results are that yes, it is a bit less accurate, but not significantly, eg
julia> results = collect_discrepancy(Float64, 1000, 16);
julia> mean(r.e1 ≥ r.e2 for r in results)
0.816
julia> mean(r.e1 for r in results)
6.058890368587643e-17
julia> mean(r.e2 for r in results)
5.729100462266434e-17
The same pattern emerges if I look at the outliers, or any other pertinent statistic. So it seems that this optimization is pretty innocuous.