Nested (non-syntactic?) loop fusion in 0.6

Is there a way for loop fusion in Julia to operate even recursively into function calls? By that I mean, consider,

julia> a=b=rand(2000,2000);

julia> @time a.+b.+a;
  0.064173 seconds (10.63 k allocations: 31.015 MiB, 2.54% gc time)

julia> myplus(a,b) = a.+b
myplus (generic function with 1 method)

julia> @time a.+myplus(a,b);
  0.058254 seconds (53 allocations: 61.037 MiB, 4.30% gc time)

Its clear the latter call wasn’t fused into a single loop since twice the memory was allocated.

My best guess is this isn’t supported, and maybe never will be because the ambiguity of what to do if myplus has other side-effects. However, is there any way to make something similar to this work, to allow shorthand for some longer operation than myplus? It certainly seems within the realm of reason that the compiler could figure out in simple cases like this that its safe to fuse.

You can write a .+ myplus.(a, b)

In this case, @pabloferz’s suggestion will fuse, but it wouldn’t work if your user-defined function didn’t happen to also do the scalar operation you want – or more generally, if you have an operation that cannot be expressed as a vectorization of any scalar operation (e.g. matmul, matrix inversion). In such cases, we probably will never be able to guarantee fusion, although a sufficiently clever compiler optimization could observe that fusion is possible and do it for you.

At a higher level, the new syntactic loop fusion support means that one should shift programming styles away from what’s traditionally been done in vectorized languages:

  • In vectorized languages, one wants to push vectorization as far “up” as possible: compose more complex operations out of simpler vectorized operations, and then let vectorization flow from the top down to individual, optimized vectorized operations.

  • In Julia since 0.5, one wants to do the opposite and push scalar operations as far up as possible: express complex operations in entirely scalar form for as long as you can (i.e. until you hit an inherently array operation like a matmul or matrix inversion or whatever), and then vectorize at the very end by calling functions using f.(args...) syntax.

So in this case, if myplus can be defined to only operate on scalars, then do that – don’t define it to work on vectors at all. Instead, if you need to call it on vectors, use the myplus.(a, b) form.

1 Like

Thanks yea, I realized after @pabloferz’s answer that I made my example here too simple, indeed what I’m actually trying to do wouldn’t work because it involves a matrix multiply.

In any case, useful advice on programming style, I will keep it in mind!

Instead of loop fusion for matrix multiplication and the like, you can just chain inplace operations. That will still avoid any memory allocations.