Sum operations between arrays

Elrod · April 7, 2020, 4:28pm

Could you try replacing nchunks = Threads.nthreads() with nchunks = 2Threads.nthreads() (or maybe just make it a function argument)? Two tasks per thread will balance the load better. This wasn’t worth the overhead on my system, but it may be worth it with yours.

Mason · April 7, 2020, 4:33pm

Yeah, that gives it a slight edge over my implementation now:

julia> @benchmark tproduct_threads(x)  setup=(x=rand(1000))
BenchmarkTools.Trial: 
  memory estimate:  7.65 MiB
  allocs estimate:  82
  --------------
  minimum time:     407.578 μs (0.00% GC)
  median time:      507.603 μs (0.00% GC)
  mean time:        791.596 μs (27.10% GC)
  maximum time:     2.968 ms (80.05% GC)
  --------------
  samples:          6270
  evals/sample:     1

julia> @benchmark tproduct_avx(x)  setup=(x=rand(1000))
BenchmarkTools.Trial: 
  memory estimate:  7.66 MiB
  allocs estimate:  88
  --------------
  minimum time:     411.538 μs (0.00% GC)
  median time:      530.083 μs (0.00% GC)
  mean time:        692.385 μs (22.83% GC)
  maximum time:     2.735 ms (42.88% GC)
  --------------
  samples:          7170
  evals/sample:     1

(the runtimes on mine improved here from above because I now use @avx on the broadcasts and updated LoopVectorization.jl)

Topic		Replies	Views
How to optimize computation within vectorized list operation and large array? Performance	16	465	October 21, 2022
Julia is significantly slower (~18 x) than Matlab in vector and matrix algebra New to Julia	32	1879	June 25, 2023
Sparse matrix-vector product: much more slow than Matlab Performance matlab , optimization	24	4540	December 20, 2017
Help improving a function that constructs 3D matrix Performance	3	459	June 28, 2020
Speed up Array computation New to Julia linearalgebra , arrays , matrices	4	357	October 17, 2022

Sum operations between arrays

Related topics