So, with the new improvements to threading, I’m wondering how that can be exposed easily to users. One of the most common parallelizable pattern is broadcasting (discussed in https://github.com/JuliaLang/julia/issues/19777). The conservative approach is to make @threads
work for these, so we can do @threads a.= b.+ c
. This is fine, but adding annotations gets annoying really fast (“so, what combination of @threads @simd @inbounds @.
do I need this time?”), especially in matlab/numpy vector code with a lot of such operations. The other non-breaking approach is to have a MultithreadedArray type that implements a custom broadcast; this is also annoying. The disrupting approach is to do that automatically, which would really be an awesome feature. As far as I know matlab also does this (for a subset of builtin functions).
There are two aspects to whether that is doable and desirable: correctness and performance.
Correctness is an issue in general, because there is no way to know whether the function calls will be thread safe. However, it doesn’t seem that bad in practice. I did a for i in $(find . -name *.jl) ; do grep "\.(" $i ; done
in my .julia/packages
, and most of the dot calls are of simple functions: exp/log/sin/cos/abs/min, type conversion and so on. There are a couple of more complex functions, but all I’ve checked look OK. Of course some existing code will likely break, but that feels like the kind of major change for which it’s worth it to break stuff.
Regarding performance, I seem to be getting a constant overhead of about 5 microseconds. For a simple x .= x .+ y
, the crossover point is about N=50_000
with two threads on my laptop. For a more complex x .= abs.(cos.(sqrt.(x)))
it was about N=3_000
. That’s tricky, because broadcast is commonly used both for small and large arrays, so some kind of thresholding would have to be implemented for generic code to make use of multithreading.