Why is the non parallel loop faster than my parallel loop?

Down below i have two version of a loop, the non parallel version is significantly faster. I am not a real expert in parallelization, can you tell me why this is the case and how i can improve the performance?

using Sobol
using Base.Threads
lbs = ones(4)*0
ubs = ones(4)*30
s = SobolSeq(lbs, ubs)
s = skip(s, 100)
a_ps = []
lk = ReentrantLock()
@time Threads.@threads for i in 1:100
	boo = false
	test_params = NaN
	while boo == false
		begin
		  lock(lk)
		  try
			  test_params = Sobol.next!(s)
		  finally
			  unlock(lk)
		  end
	  	end
		if test_params[1]<10
			begin
			  lock(lk)
			  try
				  append!(a_ps, [test_params])
			  finally
				  unlock(lk)
			  end
		  	end
			boo = true
		else
			boo = false
		end
	end
end


s = SobolSeq(lbs, ubs)
s = skip(s, 100)
a_ps = []
@time for i in 1:100
	boo = false
	test_params = NaN
	while boo == false
		test_params = Sobol.next!(s)
		if test_params[1]<10
			append!(a_ps, [test_params])
			boo = true
		else
			boo = false
		end
	end
end

Hi Marius,
in your example the loop is quite cheap, i.e. just 100 iterations and each iteration doesn’t take long. This is why the overhead associated to scheduling the @threads outweighs the benefit of having multiple threads to compute the loop compared to simple sequential execution. You will see a benefit for much “heavier” loops only - more iterations/longer computation. Another option, if you’re going to deal mostly with loops at the limit of overhead/benefit of threads, would be GitHub - JuliaSIMD/Polyester.jl: The cheapest threads you can find!, which decides if it is worth using threads and if yes, how many.

2 Likes

Also note that you probably want to put your two implementations into functions before benchmarking them, as code in functions is generally much more performant than stuff defined at the top level. See the performance tips section for reference: Performance Tips · The Julia Language

2 Likes

Your parallel version is not doing anything in parallel

I re-arranged the blocks to see what it was doing

You have 100 threads taking it in turn to effectively do lock do_thing() unlock

@time Threads.@threads for i in 1:100
    test_params = NaN
    function test_then_append()
        lock(lk)
        try
            test_params = Sobol.next!(s)
            if test_params[1]<10
                append!(a_ps, [test_params])
                return true
            end
        finally
            unlock(lk)
        end
        return false
    end
    while test_then_append()
end
1 Like

Hi, sorry haven’t been here a while. Sorry for the late response and thank you for your input. So what i want to basically do is that each thread takes a new element of the Sobol sequence and then does some computation afterwards. I didn’t include these computations here for simplicity. So in this example the parallel part should be appending the parameters to a_ps, while the drawing of te Sobol sequnce should be serial. If this is not what my implementation is doing, how can i do it in parallel?

Thanks @trahflow for the hint and for the Performance Tips link.

1 Like