I have the following code, which performs a intermediary operation, and stores the final result in the output vector out
using the max
function. There are two ways to do this, one is to pre-allocate a temporary vector to hold the intermediary values, then put it in the final out
vector (function tt
), and the other one is to just create temp vectors within the loop (function tt1
) and store them in the out
vector.
I thought that pre-allocating would be the best practice and the most efficient one, but it is actually much slower. Am I doing something wrong, or is it supposed to be like this?
using BenchmarkTools
function tt(X::Vector{S}, Y::Vector{S}) where S
out = Vector{S}(undef, length(X))
temp = Array{S,2}(undef, length(Y),2)
for ii = eachindex(out)
temp[:, 1] = @. X[ii]^2 + Y
temp[:, 2] = @. X[ii]^3 + Y
out[ii] = mean(max.(temp[:,1], temp[:,2]))
end
return out
end
function tt1(X::Vector{S}, Y::Vector{S}) where S
out = Vector{S}(undef, length(X))
for ii = eachindex(out)
v1 = @. X[ii]^2 + Y
v2 = @. X[ii]^3 + Y
out[ii] = mean(max.(v1,v2))
end
return out
end
X=rand(1000);
Y=rand(10000);
@btime tt(X,Y);
@btime tt1(X,Y);#identical outputs as expected
93.869 ms (10003 allocations: 382.01 MiB) #slower
47.507 ms (6001 allocations: 229.12 MiB)