Temporary pre-allocated array within function is slower than non-preallocated?

cosmia · September 15, 2021, 9:54pm

I have the following code, which performs a intermediary operation, and stores the final result in the output vector out using the max function. There are two ways to do this, one is to pre-allocate a temporary vector to hold the intermediary values, then put it in the final out vector (function tt), and the other one is to just create temp vectors within the loop (function tt1) and store them in the out vector.

I thought that pre-allocating would be the best practice and the most efficient one, but it is actually much slower. Am I doing something wrong, or is it supposed to be like this?

using BenchmarkTools
function tt(X::Vector{S}, Y::Vector{S}) where S
    out = Vector{S}(undef, length(X))
    temp = Array{S,2}(undef, length(Y),2)
    for ii = eachindex(out)
        temp[:, 1] = @. X[ii]^2 + Y
        temp[:, 2] = @. X[ii]^3 + Y
        out[ii] = mean(max.(temp[:,1], temp[:,2]))
    end
    return out
end
function tt1(X::Vector{S}, Y::Vector{S}) where S
    out = Vector{S}(undef, length(X))
    for ii = eachindex(out)
        v1 = @. X[ii]^2 + Y
        v2 = @. X[ii]^3 + Y
        out[ii] = mean(max.(v1,v2))
    end
    return out
end

X=rand(1000);
Y=rand(10000);
@btime tt(X,Y);
@btime tt1(X,Y);#identical outputs as expected

  93.869 ms (10003 allocations: 382.01 MiB) #slower
  47.507 ms (6001 allocations: 229.12 MiB)

rdeits · September 15, 2021, 10:06pm

This isn’t doing what you want. The right-hand-side is creating a brand new vector, and then you’re (unnecessary) copying that new vector into temp.

Use .= to combine the assignment with the rest of the broadcasted operation to avoid that, or just put @. in front of the whole line.

rdeits · September 15, 2021, 10:08pm

Oh, but also isn’t ii a scalar? Why do you have any broadcasting at all?

cosmia · September 15, 2021, 10:59pm

You are right, the problem was in the @.. Thanks! The broadcasting is necessary because Y is a vector while X[ii] is a scalar. And in the larger program I’m writing, it is more complicated than that so broadcasting is necessary.

Now I realized that the difference in speed is coming from this:

function ttt(X, Y)
    exp(X[1]*2 +  Y ^X[2] )
end
function ttt1(X, Y)
    @. exp(X[1]*2 +  Y^X[2])
end

@btime ttt.(Ref(X),Y) #1.060 ms (5 allocations: 78.28 KiB)
@btime ttt1(X,Y) #1.047 ms (2 allocations: 78.20 KiB)

Obviously with such a simple code, the difference in time is negligible but the allocation difference is there. In my real application, the difference in number of allocations is huge and the main bottleneck. Is it always better to use @. instead of relying on Ref?

Topic		Replies	Views
Pre-allocating outputs, inplace functions and performance General Usage	13	1319	February 7, 2020
Performance in broadcasting vs function preallocation? Performance question , performance , benchmarktools	6	106	February 23, 2025
Pre-allocating array efficiency Performance	5	266	October 12, 2024
Common allocation mistakes Performance memory-allocation	47	7116	August 21, 2023
Performance of function with pre-allocated outputs Performance allocations , preallocation	3	317	May 8, 2023

Temporary pre-allocated array within function is slower than non-preallocated?

Related topics