Julia motivation: why weren't Numpy, Scipy, Numba, good enough?

I’ve noticed this too, and I think it’s a concern and an important problem. The most optimistic scenario I’ve come up with is that these kinds of optimizations are perhaps best incorporated into user-level julia code via a set of sophisticated iterators. For example, in ImageFiltering I’ve been able to achieve some of the same kinds of “fusion” as Halide with surprisingly few lines of code. The core element of this approach is itself a small package, TiledIteration, which I think does constitute a reusable nugget of ideas for moving forward in this problem space.

From the standpoint of implementing this kind of thing more broadly and for “micro” scale computations, a crucial optimization will be the whole stack vs heap for structs that contain “pointers” to heap-allocated memory: these iterators need to create a crazy number of wrappers for operations working on small chunks of arrays.