Hi all, I’m trying to understand the Pre-allocating outputs section in the performance tips of the docs.
I have a code that basically does the following:
function test(x, a)
if a == 1
res = x .^ 2
else
res = x .^ 3
end
return res
end
ret = ones(10)
x = ones(10) .* 5
a = 1
for k = 1:10
ret .*= test(x,a)
end
Now, abstracting away from global variable issues, I thought that this would be a good example on how to gain some efficiency with pre-allocated outputs, so following the docs, I modified above as:
function test!(ret, x, a)
if a == 1
ret = x .^ 2
else
ret = x .^ 3
end
nothing
end
ret = Array{Float64}(undef, 10)
x = ones(10) .* 5
a = 1
temp = ones(10)
for k = 1:10
test!(ret, x,a)
temp .*= ret
end
However, test!(ret,x,a) does not change the Array ret, and modifying test! to return test and adding ret = test!(ret, x, a) would defeat the purpose of pre-allocating the array, right? Finally, even if the above code worked, would it even make a difference, since now I create the temporary array temp?
Then you don’t need the test! version, and you can also drop the broadcasting inside test. You should check it for performance, though (with BenchmarkTools.jl). Hopefully, constant propagation can eliminate the branch, but I’m not certain.
Note that this isn’t right. There is nothing wrong with returning the mutated container, it’s in fact a very common pattern. Writing ret = test!(ret, x, a) we reassign ret but it will be assigned back to itself without creating any allocations. It’s a bit like writing ret = ret.
Then you don’t need the test! version, and you can also drop the broadcasting inside test . You should check it for performance, though (with BenchmarkTools.jl). Hopefully, constant propagation can eliminate the branch, but I’m not certain.
That is something that I tried doing, but it seems that it would be a little convoluted for my case, since in my actual case, I have a vector of parameters of a smaller size, while x is a vector of values, as in:
function test(a, b, x)
if a == 1
ret = x ^ b[1]
else
ret = x ^ b[2]
end
return ret
end
x = ones(10) .* 5
b = [2, 3]
a = 1
ret = test.(a, b, x) # broadcast error
ret = test.(a, b[1], x) # works fine
I could break the function into 4 parameters, but the size of b varies slightly depending on the case.
Yes, I noticed that the number of allocations did not decrease if I used ret[:], thanks!