Where does Julia (ecosystem) provide the greatest speedup, and where does it lag the most behind (compared to e.g. Python)?

Yes, I agree with this point and I think Go’s dev team has been making clever decisions for the domain they are targeting. But is this also a superpower in the domain Julia shines? Go has been criticised as a non-customizable language (e.g., took about a decade to get generics). Although Go developers effectively proved that customizability does not really matter in the domain Go is used for (e.g., networking applications), I think composability of Julia programs is deeply rooted in customizability (many custom array types, the broadcasting infrastructure, etc.).

I don’t think it’s entirely far-fetched to have some well-defined abstract protocol (but not just coroutine; otherwise, yes, it’d then fracture the ecosystem) upon which the ecosystem is build while some performance engineers can tweak or build tailor-made (nested) runtimes. Although doing this for unrestricted concurrency is very challenging (which I still am interested in, though), if we restrict to just parallel computations, I think it’d be much easier and actually is very beneficial. In fact, I kind of have already done this in FoldsThreads.jl (ref: ANN discourse thread) for a rather restricted kind of computation (= a slightly extended class of divide-and-conquer algorithmic skeleton). Each library API and even each use of each API can have very different requirements in terms of the scheduling (e.g., some wants throughput, some wants low latency). So, I can’t help wondering if/how we can have a united ecosystem while maximizing the customizability of the scheduling. In a way, it’s a theme common to Halide/TVM/Tiramisu/MLIR.

Isn’t “that’s not possible” actually the point? I think more realistic and beneficial approach is to make parallel tasks fusible by the compiler, runtime, and libraries. It’d let programmers denote existing parallelism in the program without causing incurring the run-time cost. Decoupling how and what enables more optimizations by the compiler, runtime, and engineers. Exploiting the parallelism or not in actual compiled result can be lazily decided at much later stages. An alternative notion of task for may-happen in parallel parallelism (JuliaLang/julia#39773) (Cilk-like tasks) is a prerequisite for this (and other optimizations). This is because we can’t fuse tasks if they try to use unrestricted concurrent communication APIs without potentially introducing deadlocks.

7 Likes