What is the “standard” way to use multi-threading with dot-calls (vectorized/broadcast)?

lmtzx9h4qqnt · April 24, 2024, 7:10pm

All the discussions I’ve found on this topic are years old:

What is currently the most straight-forward way to get multi-threading with operations like these:

a = rand(4000, 4000)
b = rand(4000, 4000)
c = zeros(4000, 4000)
c .= a .+ exp.(b)

Does everyone just use Strided.jl? (I think LoopVectorization.jl also had some kind of threading capability, but the package is now deprecated, and either way, not as general and much more invasive than simply parallelizing independent operations across threads OpenMP-style.)

jar1 · April 24, 2024, 7:14pm

I use the multithreaded map functions from OhMyThreads.jl or ThreadsX.jl.

lmtzx9h4qqnt · April 24, 2024, 7:15pm

Neither of those work with broadcasts though, as far as I can tell?

jar1 · April 24, 2024, 8:19pm

Right, map is an alternative to broadcast.

PeterSimon · April 25, 2024, 11:30pm

I hadn’t heard of this amazing package until you mentioned it. For fun, and to practice my almost-forgotten macro-fu, I made an attempt to write a short macro to save some typing when combining @strided and @.:

module StridedDot

export @sd

using Strided: Strided, _strided, maybestrided, sreshape, sview, maybeunstrided
using MacroTools: @capture, postwalk

function xform(ex)
    postwalk(ex) do x
        @capture(x, Strided) || return x
        return :StridedDot
    end
end

macro sd(ex1)
    ex = Strided._strided(Base.Broadcast.__dot__(ex1))
    esc(xform(ex))
end

end # module

Then the timing comparison on my 8-core Core i7-9700 is:

using BenchmarkTools

a = rand(4000, 4000)
b = rand(4000, 4000)
c = zeros(4000, 4000)

f1!(c, a, b) = @. c = a + exp(b)

@btime f1!($c, $a, $b) # 63.204 ms (0 allocations: 0 bytes)

using .StridedDot
f2!(c, a, b) = @sd c = a + exp(b)
c2 = similar(c)
@btime f2!($c2, $a, $b) # 15.843 ms (125 allocations: 11.42 KiB)

c == c2 # true

lmtzx9h4qqnt · April 26, 2024, 4:47am

Unfortunately, I just noticed the following discussion in Correct way to parallelize this code? · Issue #9 · Jutho/Strided.jl · GitHub

And again the discussion ends around the same time frame, 2021. Not that it’s not a good option in those cases where it works, but maybe not exactly the ideal candidate for a “standard” solution.

Topic		Replies	Views
Multithreading broadcasted array ops General Usage	4	106	March 13, 2025
Multithreaded broadcast? Internals & Design multithreading , broadcast	20	8362	May 6, 2021
Multithreading broadcast/map calls Performance multithreading	7	629	April 30, 2023
Same multi-threaded code, scaling observed only on some machines Performance	2	72	August 14, 2024
When should I write loops or vectorised calls? General Usage	17	1777	December 1, 2020

What is the “standard” way to use multi-threading with dot-calls (vectorized/broadcast)?

Related topics