Parallel reductions

If you’re running on a single machine, I typically use reduce from ThreadsX.jl. For multi-node parallel reductions, take a look at Base Julia’s @distributed macro or Dagger.jl.

1 Like

My incomplete view of the thread-parallel ecosystem is that there’s the JuliaSIMD universe (lots of @Elrod 's work) and the JuliaFolds universe (lots of @tkf ). LoopVectorization is very tough to beat for raw speed, but Transducers is great for composability.

ThreadsX is based on Transducers. For a nice high-level API for LoopVectorization, check out @mcabbott 's Tullio.jl.

@stillyslalom @cscherrer that’s really interesting, thanks for sharing.

1 Like

This, alas, is nothing but a shameless plug, but having happened upon this thread the other day, I feel like it’s slightly less shameless.

2 Likes