Please explain LinearAlgebra.__normalize!

Tamas_Papp · May 2, 2025, 5:01pm

I am trying to understand LinearAlgebra.__normalize!, quoted in full here:

@inline function __normalize!(a::AbstractArray, nrm)
    # The largest positive floating point number whose inverse is less than infinity
    δ = inv(prevfloat(typemax(nrm)))
    if nrm ≥ δ # Safe to multiply with inverse
        invnrm = inv(nrm)
        rmul!(a, invnrm)
    else # scale elements to avoid overflow
        εδ = eps(one(nrm))/δ
        rmul!(a, εδ)
        rmul!(a, inv(nrm*εδ))
    end
    return a
end

Specifically, what is the two-step branch protecting against?

An example of how it fails without it would be useful too.

mbauman · May 2, 2025, 5:22pm

A simple example is [nextfloat(0.0)]. In short, there are floating point numbers greater than zero whose inverse is Inf. Multiplying by \frac{εδ}{εδ} can avoid it.

julia> a = [nextfloat(0.0)]
1-element Vector{Float64}:
 5.0e-324

julia> nrm =norm(a)
5.0e-324

julia> a*inv(nrm)
1-element Vector{Float64}:
 Inf

julia> δ = inv(prevfloat(typemax(nrm)))
5.562684646268003e-309

julia> εδ = eps(one(nrm))/δ
3.99168061906944e292

julia> a*εδ
1-element Vector{Float64}:
 1.9721522630525295e-31

julia> ans * inv(nrm*εδ)
1-element Vector{Float64}:
 1.0

danielwe · May 2, 2025, 5:37pm

Any idea why specifically that factor (which is basically eps(floatmax(nrm)))? Seems like floatmin(nrm) / nextfloat(zero(nrm)) is a more intuitive and less extreme value that would do the trick. The point, as I understand it, is just to lift all subnormals into the normals.

Tamas_Papp · May 3, 2025, 10:09am

Thanks for the detailed explanation, now I get it.

On a related note: my understanding was that performing float division as multiplication by the inverse is faster, but less accurate. I was curious how much less accurate it is, so I tested it with random values like

using LinearAlgebra, Statistics

function collect_discrepancy(::Type{T}, N, K) where T
    function _f(x)
        z1 = normalize(x, 2)    # __normalize! under the hook
        z2 = x ./ norm(x, 2)    # plain vanilla rdiv
        xb = BigFloat.(x)       # compare to “exact” result
        zb = T.(xb ./ norm(xb, 2))
        e1 = maximum(abs, z1 .- zb)
        e2 = maximum(abs, z2 .- zb)
        (; x, e1, e2)
    end
    [_f(randn(K)) for _ in 1:N]
end

and the results are that yes, it is a bit less accurate, but not significantly, eg

julia> results = collect_discrepancy(Float64, 1000, 16);

julia> mean(r.e1 ≥ r.e2 for r in results)
0.816

julia> mean(r.e1 for r in results)
6.058890368587643e-17

julia> mean(r.e2 for r in results)
5.729100462266434e-17

The same pattern emerges if I look at the outliers, or any other pertinent statistic. So it seems that this optimization is pretty innocuous.

mikmoore · May 20, 2025, 10:04pm

Well, mostly…

julia> LinearAlgebra.normalize([49e0]) # pad vector with as many zeros as you want
1-element Vector{Float64}:
 0.9999999999999999

julia> complex(49e0) / complex(49e0) # easter egg: Julia's complex division also uses invert-and-multiply
0.9999999999999999 + 0.0im

If you care that x / x == 1 in a particular situation then best to avoid x * inv(x) (but generally this isn’t a concern). Also, computing the inverse isn’t free (except at compile time) and SIMD division is a thing, so if you’re doing just a few divisions by a particular number then division tends to be faster.

Tamas_Papp · May 21, 2025, 9:20am

I wonder if it would make sense to branch when the length is small. Using the (possibly naive) benchmark

using BenchmarkTools, LinearAlgebra

function _time(N)
    x = randn(N)
    z = randn()
    a = @belapsed ($x ./ $z)
    b = @belapsed LinearAlgebra.__normalize!(y, $z) setup=(y=copy(x))
    N => a / b
end

I get

so there is a tiny saving for small values.

mikmoore · May 21, 2025, 2:58pm

A branch on length isn’t free either and would eat somewhat into this already-small gain. It would also be a tad annoying if zero-padding an array changed the result on the nonzeros (though like I argued above, in most situations one doesn’t care either way). In total, I don’t think the effort and extra code is worth the small gain.

Maybe for a small SArray one could make a case, but otherwise there isn’t a lot of precedent for aggressively optimizing array operations for small sizes. I was mostly pointing this out to remark that invert-and-multiply has artifacts and isn’t always a performance win versus a limited number of divisions.

Although the second branch in __normalize! is implemented strangely. Maybe rmul! is faster than .*= (although it’s not easy to imagine how it could differ significantly) but doing it twice (as in the else branch) is definitely going to be slower than fusing those loops (while still preserving the order of operations). That could be an okay PR (I would say “good”, but it’s a pretty rare branch).

The place where I first noticed this is in the complex division case I hinted at above. Our current implementation uses an invert-and-multiply which, in my testing, was marginally slower than divides. I might have made that PR long ago, but I have slightly bigger ambitions for complex division (fixing the x/x != 1 cases, which is tough without measurably reducing the performance) that I might get back to some day.

Tamas_Papp · May 21, 2025, 3:10pm

Or messing with AD, I guess.

You are right about optimizing the small case, I don’t think it is worth it.

Topic		Replies	Views
Performance of norm function Performance linearalgebra	21	12378	March 28, 2021
Norm, normalize Internals & Design	6	9545	February 1, 2017
Orders of magnitude runtime difference in row-wise norm Performance	5	363	March 21, 2023
Dividing array by float: is multiplying with precalculated inverse worth it Performance broadcast	12	860	November 30, 2022
Slowdown due to subnormal float, coming from neural net training Performance	20	843	October 27, 2022

Please explain LinearAlgebra.__normalize!

Related topics