I am writing some scientific code and would like to avoid unnecessary allocations. Specifically, I have a bunch of small functions (essentially one-liners) manipulating Arrays (representing tensors). This is all inner-loop stuff and I want it to be as lean as possible. As a super-simplified example:
linear(x::Vector{Float64}) = x'*W .+ b
My questions are:
From a performance standpoint, does it make sense to define
linear!(x::Vector{Float64}, res::Vector{Float64}) = (res .=x'*W .+ b)
or can I count on the compiler to save me from copy/pasting code by inlining the function or performing some type of return value optimization? Or, does it make sense to @inline this type of functions myself?
What are useful macros/tools for checking this kind of stuff? @code_llvm? Some thing from BenchmarkTools.jl? There’s a bunch of stuff out there but as a relative beginner and having no experience with reading lower-level code, I don’t know what/where to look for.
The second will be slightly faster, but it is probably worth noting that linear!(x, res) = (res .=x'*W .+ b) will have exactly the same performance. Also, if W and b are non-const global variables, that will absolutely kill performance.
The fact that broadcasting is not enough wasn’t obvious to me. And that’s why I added the second question in the original post - what’s the easiest way to catch this kind of stuff?
* is a binary operator that performs the multiplication, its a regular function call. Only dotted operations fuse under broadcast. I’m afraid I don’t know any other method of catching such allocations other than measuring them explicitly using, e.g. @time.
You can also use Meta.@lower to see what an expression lowers to, I have a feeling it might reveal the temporary array, if not, @code_typed might