Suppose I have
fm_residuals(pc, F) = signed_dist.(pc,Ref(F))
tst .= fm_residuals(pc,F)
Will a temporary be created for the signed distances in the scope of
fm_residuals or will the result be directly written into
tst at the scope of the caller? Does the behavior depend on whether
fm_residuals is inlined?
Yes, a temporary will likely be created. The proper way to use broadcasting in this way is
tst .= signed_dist.(pc,Ref(F))
so that the two broadcast operations will fuse and avoid creating the temporary array.
Function calls break loop fusion in broadcasting.
I thought that inlining might fuse them again. But apparently not.
Broadcasting is, more or less, a syntactic transform (without the need for a macro, as it’s built into the language). Semantically, the call to
fm_residuals has to complete doing its thing before the result can be used, so the loops can’t be fused. There is no automatic lazy evaluation of some sort going on there - julia is not a lazy language.
So for broadcasting, it seems that you cannot compose small functions efficiently; e.g.,
cost = sum(loss.(residual.(x,Ref(model))))
will not fuse the loss and residual functions. Should these be composed into one function so broadcasting is done efficiently? This disallows some flexibility in composition.
No, that is fused, since the broadcast is on the same syntactic level. This would not be
g(x, model) = residual.(x, Ref(model))
cost = sum(loss.(g(x, model))
because the call to
g has to complete before
loss can be broadcast over the result.
What I mean with “function calls break loop fusion” is that broadcasting cannot “look into” what a function that is called is doing and fuse loops with any broadcasts that may happen inside of that function. Broadcasting can only “see” what is immediately around it.
Thanks for the explanation. It’s clear now.
Note that this allocates one array, you can call the method of
sum that takes a function as the first argument to avoid the allocation completely. This is often applicable when you’re toimg reductions, like a sum. There is no need to allocate an arrays just to reduce it to a scalar.
Yes, I overlooked that allocation! Thanks.