[ANN] ThreadsX.jl: Parallelized Base functions

Elrod · April 2, 2020, 10:23am

What about

julia> using ThreadsX, BenchmarkTools

julia> const impl = ThreadsX # Base
ThreadsX

julia> selfdot(x) = impl.mapreduce(abs2 , +, x)
selfdot (generic function with 1 method)

julia> x = rand(10^4);

julia> @code_warntype selfdot(x)
Variables
  #self#::Core.Compiler.Const(selfdot, false)
  x::Array{Float64,1}

Body::Any
1 ─ %1 = ThreadsX.mapreduce::Core.Compiler.Const(ThreadsX.mapreduce, false)
│   %2 = (%1)(Main.abs2, Main.:+, x)::Any
└──      return %2

julia> @btime $x' * $x
  1.204 μs (0 allocations: 0 bytes)
3313.78577447527

julia> @btime selfdot($x)
  94.007 μs (12187 allocations: 630.94 KiB)
3313.78577447527

The ThreadsX.mapreduce inferred correctly, but the mapreduce itself did not. Hopefully fixing that will make it competitive with the base dot product, although I may have to specify

function mydot(x, y)
    init = zero(promote_type(eltype(x), eltype(y)))
    ThreadsX.mapreduce(Base.FastMath.mul_fast, Base.FastMath.add_fast, x, y, init = init)
end

If that’s required for SIMD reductions.

Topic		Replies	Views
Huge performance fluctuations in parallel benchmark: insights? Performance parallel , multithreading , benchmarktools	52	2748	December 1, 2021
ThreadsX mapreduce performance Performance multithreading	12	1208	October 18, 2021
Threads.@threads with ONE thread: how to remove the overhead? Performance threads	10	4070	April 2, 2021
Innefficient paralellization? Need some help optimizing a simple dot product Performance question , parallel	32	4885	March 28, 2018
Huge performance improvement by separating function? General Usage	15	1207	August 27, 2018

[ANN] ThreadsX.jl: Parallelized Base functions

Related topics