I have two functions, their only difference is one uses +=
and another uses =
, but are making a > 4x
performance difference. The +
is much faster than sin
and cos
and I suppose the following two functions should not make such a big difference. I tried these two versions in fortran, and confirmed this point. So why does Julia compiler generates slow code?
julia> using BenchmarkTools
julia> function pyramid0!(v!, x::AbstractVector{T}) where T
@assert size(v!,2) == size(v!,1) == length(x)
for j=1:length(x)
v![1,j] = x[j]
end
@inbounds for i=1:size(v!,1)-1
for j=1:size(v!,2)-i
v![i+1,j] = cos(v![i,j+1]) * sin(v![i,j])
end
end
end
pyramid0! (generic function with 1 method)
julia> let
n = 1000
x = collect(Float64, 1:n)
v = zeros(1000, 1000)
@benchmark pyramid0!($v, $x) seconds=1
end
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 5.902 ms (0.00% GC)
median time: 5.913 ms (0.00% GC)
mean time: 5.943 ms (0.00% GC)
maximum time: 6.341 ms (0.00% GC)
--------------
samples: 169
evals/sample: 1
julia> function pyramid0!(v!, x::AbstractVector{T}) where T
@assert size(v!,2) == size(v!,1) == length(x)
for j=1:length(x)
v![1,j] = x[j]
end
@inbounds for i=1:size(v!,1)-1
for j=1:size(v!,2)-i
v![i+1,j] += cos(v![i,j+1]) * sin(v![i,j])
end
end
end
pyramid0! (generic function with 1 method)
julia> let
n = 1000
x = collect(Float64, 1:n)
v = zeros(1000, 1000)
@benchmark pyramid0!($v, $x) seconds=1
end
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 25.104 ms (0.00% GC)
median time: 25.257 ms (0.00% GC)
mean time: 25.393 ms (0.00% GC)
maximum time: 28.555 ms (0.00% GC)
--------------
samples: 40
evals/sample: 1
I tried Julia 1.5, 1.6 and master branch.