I have the following code, which performs a intermediary operation, and stores the final result in the output vector `out`

using the `max`

function. There are two ways to do this, one is to pre-allocate a temporary vector to hold the intermediary values, then put it in the final `out`

vector (function `tt`

), and the other one is to just create temp vectors within the loop (function `tt1`

) and store them in the `out`

vector.

I thought that pre-allocating would be the best practice and the most efficient one, but it is actually much slower. Am I doing something wrong, or is it supposed to be like this?

```
using BenchmarkTools
function tt(X::Vector{S}, Y::Vector{S}) where S
out = Vector{S}(undef, length(X))
temp = Array{S,2}(undef, length(Y),2)
for ii = eachindex(out)
temp[:, 1] = @. X[ii]^2 + Y
temp[:, 2] = @. X[ii]^3 + Y
out[ii] = mean(max.(temp[:,1], temp[:,2]))
end
return out
end
function tt1(X::Vector{S}, Y::Vector{S}) where S
out = Vector{S}(undef, length(X))
for ii = eachindex(out)
v1 = @. X[ii]^2 + Y
v2 = @. X[ii]^3 + Y
out[ii] = mean(max.(v1,v2))
end
return out
end
X=rand(1000);
Y=rand(10000);
@btime tt(X,Y);
@btime tt1(X,Y);#identical outputs as expected
93.869 ms (10003 allocations: 382.01 MiB) #slower
47.507 ms (6001 allocations: 229.12 MiB)
```