Where does Julia (ecosystem) provide the greatest speedup, and where does it lag the most behind (compared to e.g. Python)?

Since one of the themes in this thread has been the composability of parallel Julia programs, I cannot help but comment that there is a concern in the composability of ThreadingUtilities.jl (and so CheapThreads.jl that depends on it). There is a discussion in: Overhead of `Threads.@threads` - #16 by tkf. @Elrod said he had some ideas of fixing it so it may be fixed or mitigated at some point, though.

I think composability for parallel Julia programs will be a community effort. For example, APIs with data races (e.g., a function that mutates global states without holding a lock) cannot be composed with parallel functions. A more subtle example is that, if a package manipulates the scheduling aspect of the task too eagerly, it may not work well in the context of a large application. The basis of composability is the centralized management of computation resources (= CPU + cache). Of course, I think any “hacks” will be justified if it helps you in the end. It is your software, after all. However, I cannot imagine a scenario that the Julia ecosystem evolves into a composable parallel platform without trusting the core parallel task runtime.

5 Likes