Will a temporary be created for the signed distances in the scope of fm_residuals or will the result be directly written into tst at the scope of the caller? Does the behavior depend on whether fm_residuals is inlined?
Broadcasting is, more or less, a syntactic transform (without the need for a macro, as it’s built into the language). Semantically, the call to fm_residuals has to complete doing its thing before the result can be used, so the loops can’t be fused. There is no automatic lazy evaluation of some sort going on there - julia is not a lazy language.
So for broadcasting, it seems that you cannot compose small functions efficiently; e.g.,
cost = sum(loss.(residual.(x,Ref(model))))
will not fuse the loss and residual functions. Should these be composed into one function so broadcasting is done efficiently? This disallows some flexibility in composition.
because the call to g has to complete before loss can be broadcast over the result.
What I mean with “function calls break loop fusion” is that broadcasting cannot “look into” what a function that is called is doing and fuse loops with any broadcasts that may happen inside of that function. Broadcasting can only “see” what is immediately around it.
Note that this allocates one array, you can call the method of sum that takes a function as the first argument to avoid the allocation completely. This is often applicable when you’re toimg reductions, like a sum. There is no need to allocate an arrays just to reduce it to a scalar.