Why is LLVM missing this constant in a loop?

Bardo · December 11, 2021, 5:18pm

using BenchmarkTools
using LinearAlgebra

function f(x,y)
    z = 0.0
    for i in 1:10
        xy = x .* y
        z += norm(xy)
    end
    z
end

function fopt(x,y)
    z = 0.0
    xy = x .* y
    for i in 1:10
        z += norm(xy)
    end
    z
end

x=rand(2); y=rand(2)
@btime f($x,$y)
@btime fopt($x,$y)

  464.467 ns (10 allocations: 800 bytes)
  146.643 ns (1 allocation: 80 bytes)

roflmaostc · December 11, 2021, 6:59pm

Is it a constant?
I’m not familiar with LLVM but x and y are references to arrays which could change during execution of the loop.

See:

using BenchmarkTools
using LinearAlgebra

function f(x,y)
    z = 0.0 
    for i in 1:10
        xy = x .* y
        @show norm(xy) 
        z += norm(xy)
    end 
    z   
end

function g(x,y)
    for i = 1:10000000
        x .= rand(2)
        y .= rand(2)
    end
end

function f()
    x=rand(2); y=rand(2)
    Threads.@spawn g(x,y)
    f(x,y)
end


julia> f()
norm(xy) = 0.26590433662552576
norm(xy) = 0.603214499437178
norm(xy) = 1.0494104231262562
norm(xy) = 0.6612645779226971
norm(xy) = 0.3619931631859387
norm(xy) = 0.59579143483913
norm(xy) = 0.02516422097763711
norm(xy) = 0.6709932485823076
norm(xy) = 0.4531504941054873
norm(xy) = 0.19392321674706448
4.880809615549222

If you change that to Tuples (which are immutable), you’ll see the wanted behaviour:

function main()
    x=Tuple(rand(2)); y=Tuple(rand(2))
    @btime f($x,$y)
    @btime fopt($x,$y)
end 


# REPL
julia> main()
  42.424 ns (0 allocations: 0 bytes)
  41.396 ns (0 allocations: 0 bytes)

Bardo · December 11, 2021, 7:12pm

In my code, yes:

using LinearAlgebra

function f(x,y)
    z = 0.0
    for i in 1:10
        @show xy = x .* y
        z += norm(xy)
    end
    z
end

x=rand(2); y=rand(2)
f(x,y)

xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]

ToucheSir · December 11, 2021, 7:38pm

It may well be in practice, but to @roflmaostc’s point the compiler can’t prove it is. Arrays in particular are a bit of a black box right now (there has been a lot of work recently on changing this, but that hasn’t made it to stable Julia yet).

lmiq · December 11, 2021, 7:43pm

In theory it could optimize that to:

function fopt2(x,y)
    xy = x .* y
    z = 10*norm(xy)
    z
end;

But it doesn’t. (And by hand you can avoid calculating and allocating xy, if that is performance critical)

Bardo · December 11, 2021, 7:43pm

Thanks to both of you!
One more reason to use tuples whenever possible.

lmiq · December 11, 2021, 7:58pm

The spawn example is very interesting and educative. Before that I would think that the compiler should optimize that out, and the fact that it doesn’t was an implementation detail.

Now, unlikely it is that any funcional code does what you have shown, such a change would be breaking.

What is the opinion of core developers on this? Can such optimizations be implemented in 1.x?

roflmaostc · December 11, 2021, 8:00pm

A new immutable Array type could fix that?
That would be just a new non-breaking feature, wouldn’t it be?

lmiq · December 11, 2021, 8:03pm

I cannot test now, but what happens with the mutable static arrays, MArray, of StaticArrays?

If in that case the optimization occurs, the behavior is inconsistent already.

(The immutable static arrays will behave like tuples here of course)

roflmaostc · December 11, 2021, 8:10pm

function main()
    x=rand(2); y=rand(2)

    xR, yR = ReadOnlyArray.((x,y))
    xS, yS = SVector(x...), SVector(y...)
    xM, yM = MVector(x...), MVector(y...)
    @btime f($xR,$yR)
    @btime fopt($xR,$yR)
    @btime f($xS,$yS)
    @btime fopt($xS,$yS)
    @btime f($xM,$yM)
    @btime fopt($xM,$yM)
end


# REPL
julia> main()
  597.750 ns (10 allocations: 800 bytes)
  126.113 ns (1 allocation: 80 bytes)
  3.245 ns (0 allocations: 0 bytes)
  3.245 ns (0 allocations: 0 bytes)
  4.823 ns (0 allocations: 0 bytes)
  4.851 ns (0 allocations: 0 bytes)
3.523747450734117

Strange, that MVector does not make a difference?

ToucheSir · December 11, 2021, 8:16pm

It’s being worked on: https://github.com/JuliaLang/julia/pull/41777

MArray uses a backing NTuple for data storage, so presumably the optimizer has far more transparency to work with.

Topic		Replies	Views
Fancy LLVM loop transformation pass that deoptimize the code Performance llvm	2	196	April 16, 2025
Memrory allocation General Usage question	7	551	November 18, 2018
Inconsistent llvm code General Usage	6	567	December 8, 2016
Strange performance of a loop Performance	37	3177	July 21, 2018
Different `@code_llvm` output on macos and x86 Performance simd	4	305	December 8, 2023

Why is LLVM missing this constant in a loop?

Related topics