Since one of the themes in this thread has been the composability of parallel Julia programs, I cannot help but comment that there is a concern in the composability of ThreadingUtilities.jl (and so CheapThreads.jl that depends on it). There is a discussion in: Overhead of `Threads.@threads` - #16 by tkf. @Elrod said he had some ideas of fixing it so it may be fixed or mitigated at some point, though.
I think composability for parallel Julia programs will be a community effort. For example, APIs with data races (e.g., a function that mutates global states without holding a lock) cannot be composed with parallel functions. A more subtle example is that, if a package manipulates the scheduling aspect of the task too eagerly, it may not work well in the context of a large application. The basis of composability is the centralized management of computation resources (= CPU + cache). Of course, I think any “hacks” will be justified if it helps you in the end. It is your software, after all. However, I cannot imagine a scenario that the Julia ecosystem evolves into a composable parallel platform without trusting the core parallel task runtime.