Innefficient paralellization? Need some help optimizing a simple dot product

tkoolen · March 15, 2018, 7:20pm

Oh wow, that’s surprising to me. Some results on my machine (0.6.2, Linux):

Julia (no threads):     Trial(35.786 μs)

BLAS (1 threads):       Trial(35.793 μs)
Julia (1 threads):      Trial(36.234 μs)

BLAS (2 threads):       Trial(35.731 μs)
Julia (2 threads):      Trial(19.571 μs)

BLAS (4 threads):       Trial(35.775 μs)
Julia (4 threads):      Trial(10.698 μs)

BLAS (8 threads):       Trial(35.772 μs)
Julia (8 threads):      Trial(5.150 μs)

BLAS (16 threads):      Trial(35.782 μs)
Julia (16 threads):     Trial(4.514 μs)

BLAS (32 threads):      Trial(35.760 μs)
Julia (32 threads):     Trial(4.050 μs)

BLAS (64 threads):      Trial(35.819 μs)
Julia (64 threads):     Trial(4.122 μs)

Edit: added more results.

Topic		Replies	Views
How to make Julia slow? Performance question	11	2023	September 16, 2019
Interesting post about SIMD dot product (and cosine similarity) Offtopic performance	17	854	December 2, 2024
Naive dot product faster in Fortran than in Juila Performance	12	1403	July 24, 2021
Dot product not parallelized on cluster Performance linearalgebra	4	260	January 4, 2023
Optimize dot(::Vector{Float64}, ::Vector{ForwardDiff.Dual}) General Usage question	3	332	January 27, 2020

Innefficient paralellization? Need some help optimizing a simple dot product

Related topics