using BenchmarkTools
using LinearAlgebra
function f(x,y)
z = 0.0
for i in 1:10
xy = x .* y
z += norm(xy)
end
z
end
function fopt(x,y)
z = 0.0
xy = x .* y
for i in 1:10
z += norm(xy)
end
z
end
x=rand(2); y=rand(2)
@btime f($x,$y)
@btime fopt($x,$y)
Is it a constant?
I’m not familiar with LLVM but x and y are references to arrays which could change during execution of the loop.
See:
using BenchmarkTools
using LinearAlgebra
function f(x,y)
z = 0.0
for i in 1:10
xy = x .* y
@show norm(xy)
z += norm(xy)
end
z
end
function g(x,y)
for i = 1:10000000
x .= rand(2)
y .= rand(2)
end
end
function f()
x=rand(2); y=rand(2)
Threads.@spawn g(x,y)
f(x,y)
end
julia> f()
norm(xy) = 0.26590433662552576
norm(xy) = 0.603214499437178
norm(xy) = 1.0494104231262562
norm(xy) = 0.6612645779226971
norm(xy) = 0.3619931631859387
norm(xy) = 0.59579143483913
norm(xy) = 0.02516422097763711
norm(xy) = 0.6709932485823076
norm(xy) = 0.4531504941054873
norm(xy) = 0.19392321674706448
4.880809615549222
If you change that to Tuples (which are immutable), you’ll see the wanted behaviour:
using LinearAlgebra
function f(x,y)
z = 0.0
for i in 1:10
@show xy = x .* y
z += norm(xy)
end
z
end
x=rand(2); y=rand(2)
f(x,y)
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
xy = x .* y = [0.6666109828718713, 0.35500370844922313]
It may well be in practice, but to @roflmaostc’s point the compiler can’t prove it is. Arrays in particular are a bit of a black box right now (there has been a lot of work recently on changing this, but that hasn’t made it to stable Julia yet).
The spawn example is very interesting and educative. Before that I would think that the compiler should optimize that out, and the fact that it doesn’t was an implementation detail.
Now, unlikely it is that any funcional code does what you have shown, such a change would be breaking.
What is the opinion of core developers on this? Can such optimizations be implemented in 1.x?