Please explain LinearAlgebra.__normalize!

I am trying to understand LinearAlgebra.__normalize!, quoted in full here:

@inline function __normalize!(a::AbstractArray, nrm)
    # The largest positive floating point number whose inverse is less than infinity
    δ = inv(prevfloat(typemax(nrm)))
    if nrm ≥ δ # Safe to multiply with inverse
        invnrm = inv(nrm)
        rmul!(a, invnrm)
    else # scale elements to avoid overflow
        εδ = eps(one(nrm))/δ
        rmul!(a, εδ)
        rmul!(a, inv(nrm*εδ))
    end
    return a
end

Specifically, what is the two-step branch protecting against?

An example of how it fails without it would be useful too.

A simple example is [nextfloat(0.0)]. In short, there are floating point numbers greater than zero whose inverse is Inf. Multiplying by \frac{εδ}{εδ} can avoid it.

julia> a = [nextfloat(0.0)]
1-element Vector{Float64}:
 5.0e-324

julia> nrm =norm(a)
5.0e-324

julia> a*inv(nrm)
1-element Vector{Float64}:
 Inf

julia> δ = inv(prevfloat(typemax(nrm)))
5.562684646268003e-309

julia> εδ = eps(one(nrm))/δ
3.99168061906944e292

julia> a*εδ
1-element Vector{Float64}:
 1.9721522630525295e-31

julia> ans * inv(nrm*εδ)
1-element Vector{Float64}:
 1.0
2 Likes

Any idea why specifically that factor (which is basically eps(floatmax(nrm)))? Seems like floatmin(nrm) / nextfloat(zero(nrm)) is a more intuitive and less extreme value that would do the trick. The point, as I understand it, is just to lift all subnormals into the normals.

1 Like

Thanks for the detailed explanation, now I get it.

On a related note: my understanding was that performing float division as multiplication by the inverse is faster, but less accurate. I was curious how much less accurate it is, so I tested it with random values like

using LinearAlgebra, Statistics

function collect_discrepancy(::Type{T}, N, K) where T
    function _f(x)
        z1 = normalize(x, 2)    # __normalize! under the hook
        z2 = x ./ norm(x, 2)    # plain vanilla rdiv
        xb = BigFloat.(x)       # compare to “exact” result
        zb = T.(xb ./ norm(xb, 2))
        e1 = maximum(abs, z1 .- zb)
        e2 = maximum(abs, z2 .- zb)
        (; x, e1, e2)
    end
    [_f(randn(K)) for _ in 1:N]
end

and the results are that yes, it is a bit less accurate, but not significantly, eg

julia> results = collect_discrepancy(Float64, 1000, 16);

julia> mean(r.e1 ≥ r.e2 for r in results)
0.816

julia> mean(r.e1 for r in results)
6.058890368587643e-17

julia> mean(r.e2 for r in results)
5.729100462266434e-17

The same pattern emerges if I look at the outliers, or any other pertinent statistic. So it seems that this optimization is pretty innocuous.