Pre-allocating outputs, inplace functions and performance

cosmia · February 6, 2020, 8:31pm

Hi all, I’m trying to understand the Pre-allocating outputs section in the performance tips of the docs.

I have a code that basically does the following:

function test(x, a)
    if a == 1
        res = x .^ 2
    else
        res = x .^ 3
    end
    return res
end

ret = ones(10)
x = ones(10) .* 5
a = 1
for k = 1:10
    ret .*= test(x,a)
end

Now, abstracting away from global variable issues, I thought that this would be a good example on how to gain some efficiency with pre-allocated outputs, so following the docs, I modified above as:

function test!(ret, x, a)
    if a == 1
        ret = x .^ 2
    else
        ret = x .^ 3
    end
    nothing
end

ret = Array{Float64}(undef, 10)
x = ones(10) .* 5
a = 1
temp = ones(10)
for k = 1:10
    test!(ret, x,a)
    temp .*= ret
end

However, test!(ret,x,a) does not change the Array ret, and modifying test! to return test and adding ret = test!(ret, x, a) would defeat the purpose of pre-allocating the array, right? Finally, even if the above code worked, would it even make a difference, since now I create the temporary array temp?

Oscar_Smith · February 6, 2020, 8:35pm

You want to use ret .= x.^2. otherwise you won’t be copying into the existing vector, you’ll just be changing what it’s assigned to.

tbeason · February 6, 2020, 8:35pm

You aren’t modifying ret in that function. You need to use the inplace equals .= or ret[:] = x.^2

cosmia · February 6, 2020, 8:47pm

gosh, this is embarrassing. thanks a lot for the prompt reply!

tbeason · February 6, 2020, 9:12pm

Dang, missed the solution by seconds!

DNF · February 7, 2020, 12:08am

There’s a simpler way here that you can try:

ret .*= test.(x, a)

Then you don’t need the test! version, and you can also drop the broadcasting inside test. You should check it for performance, though (with BenchmarkTools.jl). Hopefully, constant propagation can eliminate the branch, but I’m not certain.

simeonschaub · February 7, 2020, 12:26am

Note that ret[:] = x.^2 will still allocate a separate vector x.^2, so this is probably not what the OP is looking for.

DNF · February 7, 2020, 7:55am

Note that this isn’t right. There is nothing wrong with returning the mutated container, it’s in fact a very common pattern. Writing ret = test!(ret, x, a) we reassign ret but it will be assigned back to itself without creating any allocations. It’s a bit like writing ret = ret.

cosmia · February 7, 2020, 5:25pm

There’s a simpler way here that you can try:
ret .*= test.(x, a)
Then you don’t need the test! version, and you can also drop the broadcasting inside test . You should check it for performance, though (with BenchmarkTools.jl). Hopefully, constant propagation can eliminate the branch, but I’m not certain.

That is something that I tried doing, but it seems that it would be a little convoluted for my case, since in my actual case, I have a vector of parameters of a smaller size, while x is a vector of values, as in:

function test(a, b, x)
    if a == 1
        ret = x ^ b[1]
    else
        ret = x ^ b[2]
    end
    return ret
end
x = ones(10) .* 5
b = [2, 3]
a = 1
ret = test.(a, b, x) # broadcast error
ret = test.(a, b[1], x) # works fine

I could break the function into 4 parameters, but the size of b varies slightly depending on the case.

Yes, I noticed that the number of allocations did not decrease if I used ret[:], thanks!

Interesting, thanks for the insight!

DNF · February 7, 2020, 5:29pm

You can do

ret = test.(a, Ref(b), x)

or

ret = test.(a, (b,), x)

though I believe that Ref is slightly preferred.

cosmia · February 7, 2020, 5:38pm

This works, thanks a lot! Everyday learning new things.

Interestingly, the number of allocation using your method is almost half of the inplace method, but also about 5% slower.

DNF · February 7, 2020, 5:45pm

How are doing the benchmarking?

cosmia · February 7, 2020, 6:00pm

I’m using BenchmarkTools!

Juan · February 7, 2020, 10:50pm

Could you explain the reason to use Ref, please?

Topic		Replies	Views
Pre-allocating Arrays for intermediate calculations General Usage	10	1415	March 12, 2019
Temporary pre-allocated array within function is slower than non-preallocated? New to Julia array	3	415	September 15, 2021
In place assignment of scalars vs vectors Performance	20	1427	June 25, 2021
Preallocation and data race in parallelization General Usage	9	314	March 1, 2024
Help understanding when .= allocates General Usage memory-allocation	9	827	September 10, 2022

Pre-allocating outputs, inplace functions and performance

Related topics