Parallel sampling

it is the first time i’m trying the parallel computing and before i go to my actual code, i decided to try it on a simple code to see the speed of the execution of the code

@everywhere function sampler(x, y)
    z= x.*y
    return z
@time @everywhere for i= 1:1_000_000_000
    z= sampler(x,y)   

of course this function after addprocs(4) but when i run it for 1_000_000 it takes a very long time so i want to know what am i doing wrong here and how can i make it faster?

How long does it take and why do you think it SHOULD take less time

1 Like

@everywhere for executes the same loop on all processes, hence your code will take the same amount of time regardless of how many processors your throw at it:

julia> nprocs()

julia> @time @everywhere sum(rand() for i = 1:1_000_000_000)
  3.822825 seconds (52.07 k allocations: 3.278 MiB, 1.59% compilation time)

julia> addprocs(1);

julia> @time @everywhere sum(rand() for i = 1:1_000_000_000)
  3.969405 seconds (52.14 k allocations: 3.277 MiB, 1.50% compilation time)

If you want each worker process to deal with part of the loop, use @distributed:

julia> workers()
1-element Vector{Int64}:

julia> @time @distributed (+) for i = 1:1_000_000_000; rand(); end;
  3.504719 seconds (46.90 k allocations: 2.665 MiB, 2.19% compilation time)

julia> addprocs(4); workers()
4-element Vector{Int64}:

julia> @time @distributed (+) for i = 1:1_000_000_000; rand(); end;
  1.847140 seconds (48.06 k allocations: 2.754 MiB, 0.88% compilation time)

it takes more than 5 minutes and since the purpose of the parallelism is to speed up the computation i’m expecting it to be some seconds and not minutes, as i said i’m just trying it before applying it on my actual code

it does work faster, but is there any other way to make it more faster?

I doubt it. Most basic operations take around 1 nanosecond on a modern computer, so a back-of-the-envelope estimate for the runtime of your original piece of code is

500 \times 10^9 \times 1 \text{ nanosecond} \approx 8.3 \text{ minutes}.

This matches quite well with the five minutes runtime you observed.

1 Like