Pre-allocating outputs, inplace functions and performance

Hi all, I’m trying to understand the Pre-allocating outputs section in the performance tips of the docs.

I have a code that basically does the following:

function test(x, a)
    if a == 1
        res = x .^ 2
    else
        res = x .^ 3
    end
    return res
end

ret = ones(10)
x = ones(10) .* 5
a = 1
for k = 1:10
    ret .*= test(x,a)
end

Now, abstracting away from global variable issues, I thought that this would be a good example on how to gain some efficiency with pre-allocated outputs, so following the docs, I modified above as:

function test!(ret, x, a)
    if a == 1
        ret = x .^ 2
    else
        ret = x .^ 3
    end
    nothing
end

ret = Array{Float64}(undef, 10)
x = ones(10) .* 5
a = 1
temp = ones(10)
for k = 1:10
    test!(ret, x,a)
    temp .*= ret
end

However, test!(ret,x,a) does not change the Array ret, and modifying test! to return test and adding ret = test!(ret, x, a) would defeat the purpose of pre-allocating the array, right? Finally, even if the above code worked, would it even make a difference, since now I create the temporary array temp?

You want to use ret .= x.^2. otherwise you won’t be copying into the existing vector, you’ll just be changing what it’s assigned to.

3 Likes

You aren’t modifying ret in that function. You need to use the inplace equals .= or ret[:] = x.^2

1 Like

gosh, this is embarrassing. thanks a lot for the prompt reply!

Dang, missed the solution by seconds! :turtle:

There’s a simpler way here that you can try:

ret .*= test.(x, a)

Then you don’t need the test! version, and you can also drop the broadcasting inside test. You should check it for performance, though (with BenchmarkTools.jl). Hopefully, constant propagation can eliminate the branch, but I’m not certain.

1 Like

Note that ret[:] = x.^2 will still allocate a separate vector x.^2, so this is probably not what the OP is looking for.

1 Like

Note that this isn’t right. There is nothing wrong with returning the mutated container, it’s in fact a very common pattern. Writing ret = test!(ret, x, a) we reassign ret but it will be assigned back to itself without creating any allocations. It’s a bit like writing ret = ret.

2 Likes

There’s a simpler way here that you can try:

ret .*= test.(x, a)

Then you don’t need the test! version, and you can also drop the broadcasting inside test . You should check it for performance, though (with BenchmarkTools.jl). Hopefully, constant propagation can eliminate the branch, but I’m not certain.

That is something that I tried doing, but it seems that it would be a little convoluted for my case, since in my actual case, I have a vector of parameters of a smaller size, while x is a vector of values, as in:

function test(a, b, x)
    if a == 1
        ret = x ^ b[1]
    else
        ret = x ^ b[2]
    end
    return ret
end
x = ones(10) .* 5
b = [2, 3]
a = 1
ret = test.(a, b, x) # broadcast error
ret = test.(a, b[1], x) # works fine

I could break the function into 4 parameters, but the size of b varies slightly depending on the case.

Yes, I noticed that the number of allocations did not decrease if I used ret[:], thanks!

Interesting, thanks for the insight!

You can do

ret = test.(a, Ref(b), x)

or

ret = test.(a, (b,), x)

though I believe that Ref is slightly preferred.

1 Like

This works, thanks a lot! Everyday learning new things.

Interestingly, the number of allocation using your method is almost half of the inplace method, but also about 5% slower.

How are doing the benchmarking?

I’m using BenchmarkTools!

Could you explain the reason to use Ref, please?