Why is the non parallel loop faster than my parallel loop?

Down below i have two version of a loop, the non parallel version is significantly faster. I am not a real expert in parallelization, can you tell me why this is the case and how i can improve the performance?

``````using Sobol
using Base.Threads
lbs = ones(4)*0
ubs = ones(4)*30
s = SobolSeq(lbs, ubs)
s = skip(s, 100)
a_ps = []
lk = ReentrantLock()
@time Threads.@threads for i in 1:100
boo = false
test_params = NaN
while boo == false
begin
lock(lk)
try
test_params = Sobol.next!(s)
finally
unlock(lk)
end
end
if test_params[1]<10
begin
lock(lk)
try
append!(a_ps, [test_params])
finally
unlock(lk)
end
end
boo = true
else
boo = false
end
end
end

s = SobolSeq(lbs, ubs)
s = skip(s, 100)
a_ps = []
@time for i in 1:100
boo = false
test_params = NaN
while boo == false
test_params = Sobol.next!(s)
if test_params[1]<10
append!(a_ps, [test_params])
boo = true
else
boo = false
end
end
end
``````

Hi Marius,
in your example the loop is quite cheap, i.e. just 100 iterations and each iteration doesnâ€™t take long. This is why the overhead associated to scheduling the `@threads` outweighs the benefit of having multiple threads to compute the loop compared to simple sequential execution. You will see a benefit for much â€śheavierâ€ť loops only - more iterations/longer computation. Another option, if youâ€™re going to deal mostly with loops at the limit of overhead/benefit of threads, would be GitHub - JuliaSIMD/Polyester.jl: The cheapest threads you can find!, which decides if it is worth using threads and if yes, how many.

2 Likes

Also note that you probably want to put your two implementations into functions before benchmarking them, as code in functions is generally much more performant than stuff defined at the top level. See the performance tips section for reference: Performance Tips Â· The Julia Language

2 Likes

Your parallel version is not doing anything in parallel

I re-arranged the blocks to see what it was doing

You have 100 threads taking it in turn to effectively do `lock do_thing() unlock`

``````@time Threads.@threads for i in 1:100
test_params = NaN
function test_then_append()
lock(lk)
try
test_params = Sobol.next!(s)
if test_params[1]<10
append!(a_ps, [test_params])
return true
end
finally
unlock(lk)
end
return false
end
while test_then_append()
end
``````
1 Like

Hi, sorry havenâ€™t been here a while. Sorry for the late response and thank you for your input. So what i want to basically do is that each thread takes a new element of the Sobol sequence and then does some computation afterwards. I didnâ€™t include these computations here for simplicity. So in this example the parallel part should be appending the parameters to a_ps, while the drawing of te Sobol sequnce should be serial. If this is not what my implementation is doing, how can i do it in parallel?

Thanks @trahflow for the hint and for the Performance Tips link.

1 Like