Pre-allocating outputs, inplace functions and performance

Hi all, I’m trying to understand the Pre-allocating outputs section in the performance tips of the docs.

I have a code that basically does the following:

function test(x, a)
    if a == 1
        res = x .^ 2
        res = x .^ 3
    return res

ret = ones(10)
x = ones(10) .* 5
a = 1
for k = 1:10
    ret .*= test(x,a)

Now, abstracting away from global variable issues, I thought that this would be a good example on how to gain some efficiency with pre-allocated outputs, so following the docs, I modified above as:

function test!(ret, x, a)
    if a == 1
        ret = x .^ 2
        ret = x .^ 3

ret = Array{Float64}(undef, 10)
x = ones(10) .* 5
a = 1
temp = ones(10)
for k = 1:10
    test!(ret, x,a)
    temp .*= ret

However, test!(ret,x,a) does not change the Array ret, and modifying test! to return test and adding ret = test!(ret, x, a) would defeat the purpose of pre-allocating the array, right? Finally, even if the above code worked, would it even make a difference, since now I create the temporary array temp?

You want to use ret .= x.^2. otherwise you won’t be copying into the existing vector, you’ll just be changing what it’s assigned to.


You aren’t modifying ret in that function. You need to use the inplace equals .= or ret[:] = x.^2

1 Like

gosh, this is embarrassing. thanks a lot for the prompt reply!

Dang, missed the solution by seconds! :turtle:

There’s a simpler way here that you can try:

ret .*= test.(x, a)

Then you don’t need the test! version, and you can also drop the broadcasting inside test. You should check it for performance, though (with BenchmarkTools.jl). Hopefully, constant propagation can eliminate the branch, but I’m not certain.

1 Like

Note that ret[:] = x.^2 will still allocate a separate vector x.^2, so this is probably not what the OP is looking for.

1 Like

Note that this isn’t right. There is nothing wrong with returning the mutated container, it’s in fact a very common pattern. Writing ret = test!(ret, x, a) we reassign ret but it will be assigned back to itself without creating any allocations. It’s a bit like writing ret = ret.


There’s a simpler way here that you can try:

ret .*= test.(x, a)

Then you don’t need the test! version, and you can also drop the broadcasting inside test . You should check it for performance, though (with BenchmarkTools.jl). Hopefully, constant propagation can eliminate the branch, but I’m not certain.

That is something that I tried doing, but it seems that it would be a little convoluted for my case, since in my actual case, I have a vector of parameters of a smaller size, while x is a vector of values, as in:

function test(a, b, x)
    if a == 1
        ret = x ^ b[1]
        ret = x ^ b[2]
    return ret
x = ones(10) .* 5
b = [2, 3]
a = 1
ret = test.(a, b, x) # broadcast error
ret = test.(a, b[1], x) # works fine

I could break the function into 4 parameters, but the size of b varies slightly depending on the case.

Yes, I noticed that the number of allocations did not decrease if I used ret[:], thanks!

Interesting, thanks for the insight!

You can do

ret = test.(a, Ref(b), x)


ret = test.(a, (b,), x)

though I believe that Ref is slightly preferred.

1 Like

This works, thanks a lot! Everyday learning new things.

Interestingly, the number of allocation using your method is almost half of the inplace method, but also about 5% slower.

How are doing the benchmarking?

I’m using BenchmarkTools!

Could you explain the reason to use Ref, please?