Innefficient paralellization? Need some help optimizing a simple dot product

Seif_Shebl · March 15, 2018, 3:39pm

You forgot an important thing; the arrays must be declared as SharedArrays so that the arrays can be available to every worker, even for this simple calculation the parallel version is slightly better. Here are my results:

function sdot(n, x, y)
    a = 0.0
    @inbounds @fastmath for i=1:n
        a += x[i]*y[i]   
    end
    a
end
@everywhere @inbounds @fastmath function pdot(n, x, y)
    a = @parallel (+) for i=1:n
        x[i]*y[i]
    end
end

addprocs(7)
n = 10^7
x = SharedArray{Float64,1}( ones(n) ) 
y = SharedArray{Float64,1}( ones(n) ) 

println("Naive Julia")
println(@sprintf("  %.2f", sdot(n, x, y)))
@time for i=1:3
    sdot(n, x, y)
end

println("\nNative Julia")
println(@sprintf("  %.2f", dot(x, y)))
@time for i=1:3
    dot(x, y)
end

println("\nIdiomatic parallel Julia")
println(@sprintf("  %.2f",  pdot(n, x, y)))
@time for i=1:3
    pdot(n, x, y)
end

The timings:

Naive Julia
  10000000.00
  0.027176 seconds (3 allocations: 48 bytes)

Native Julia
  10000000.00
  0.027156 seconds (3 allocations: 48 bytes)

Idiomatic parallel Julia
  10000000.00
  0.025830 seconds (4.49 k allocations: 356.578 KiB)
[Finished in 4.5s]

The difference becomes clearer for larger inputs (say n = 10^9):

Naive Julia
  1000000000.00
  4.235259 seconds (3 allocations: 48 bytes)

Native Julia
  1000000000.00
  3.088961 seconds (3 allocations: 48 bytes)

Idiomatic parallel Julia
  1000000000.00
  2.282985 seconds (4.47 k allocations: 355.719 KiB)
[Finished in 29.5s]

Topic		Replies	Views
What's the problem with this simple multi-thread code? General Usage question	17	1189	March 11, 2022
Parallelizing for loop in the computation of a gradient Performance question	19	2632	February 26, 2018
Poor performance while multithreading (Julia 1.0) Performance multithreading	28	4036	February 11, 2019
Questions on a number of code acceleration techniques General Usage performance , hpc , parallel	11	1807	July 8, 2017
Sum operations between arrays Performance	21	5716	April 7, 2020

Innefficient paralellization? Need some help optimizing a simple dot product

Related topics