I also tested v0.6.0-pre.alpha.34 and found something interesting:
Shouldn’t that be the fastest using loop fusion?:
function test_perf4()
range = 1:2000000
range_transp = collect(range)'
steering_vectors = complex.(ones(4,11), ones(4,11))
sum_signal = zeros(Complex{Float64}, 4, length(range))
for i = 1:11
sum_signal .+= steering_vectors[:,i] .* cis.((2 * pi * 1.023e6 / 4e6) .* range_transp .+ (40 * pi / 180));
end
return sum_signal
end
However this gives me:
BenchmarkTools.Trial:
memory estimate: 137.33 MiB
allocs estimate: 41
--------------
minimum time: 14.918 s (0.07% GC)
median time: 14.918 s (0.07% GC)
mean time: 14.918 s (0.07% GC)
maximum time: 14.918 s (0.07% GC)
--------------
samples: 1
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
However, when I remove the the dot after cis
I get the following:
BenchmarkTools.Trial:
memory estimate: 641.00 MiB
allocs estimate: 899
--------------
minimum time: 3.818 s (2.12% GC)
median time: 3.852 s (3.95% GC)
mean time: 3.852 s (3.95% GC)
maximum time: 3.887 s (5.75% GC)
--------------
samples: 2
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
This gives me the following warning, though:
WARNING: cis{T <: Number}(x::AbstractArray{T}) is deprecated, use cis.(x) instead.
So it seems that loop fusion still needs some adjustments.
BTW these tests were done with -O3
EDIT: The same goes for exp()
and exp.()
EDIT I filed in issue for that: https://github.com/JuliaLang/julia/issues/20875