I have found in a particular function of mine that it is fastest to perform a calculation in 4 steps. I can logically write this down as one line of code, but the fusing syntax of broadcasting ends up being too greedy and slowing it down.
Is there a way to fence off expressions of code to indicate a boundary for the fusing syntax?
The code in my case is something very similar to:
n = 50 vector = rand(n) a = reshape(vector, n,1,1) b = reshape(vector, 1,n,1) c = reshape(vector, 1,1,n) # What I would like to do: out = exp.(a) .* exp.(b) .* exp.(c) # What is fastest temp_a = exp.(a) temp_b = exp.(b) temp_c = exp.(c) out = temp_a .* temp_b .* temp_c
The speed difference in benchmarking is roughly a factor of 200 for the above example.
I can guess that the single-line version currently unnecessarily recalculates the exponential for a, b and c in every position of the output array, instead of only once for each row, etc… Is there anyway I could write
out = (@barrier exp.(a)) .* (@barrier exp.(b)) .* ...
instead? I guess I should mention that
a is actually a more complicated expression itself with several operations fused together.