Why is the non parallel loop faster than my parallel loop?

Marius_123 · December 3, 2021, 11:08am

Down below i have two version of a loop, the non parallel version is significantly faster. I am not a real expert in parallelization, can you tell me why this is the case and how i can improve the performance?

using Sobol
using Base.Threads
lbs = ones(4)*0
ubs = ones(4)*30
s = SobolSeq(lbs, ubs)
s = skip(s, 100)
a_ps = []
lk = ReentrantLock()
@time Threads.@threads for i in 1:100
	boo = false
	test_params = NaN
	while boo == false
		begin
		  lock(lk)
		  try
			  test_params = Sobol.next!(s)
		  finally
			  unlock(lk)
		  end
	  	end
		if test_params[1]<10
			begin
			  lock(lk)
			  try
				  append!(a_ps, [test_params])
			  finally
				  unlock(lk)
			  end
		  	end
			boo = true
		else
			boo = false
		end
	end
end


s = SobolSeq(lbs, ubs)
s = skip(s, 100)
a_ps = []
@time for i in 1:100
	boo = false
	test_params = NaN
	while boo == false
		test_params = Sobol.next!(s)
		if test_params[1]<10
			append!(a_ps, [test_params])
			boo = true
		else
			boo = false
		end
	end
end

fgerick · December 3, 2021, 11:23am

Hi Marius,
in your example the loop is quite cheap, i.e. just 100 iterations and each iteration doesn’t take long. This is why the overhead associated to scheduling the @threads outweighs the benefit of having multiple threads to compute the loop compared to simple sequential execution. You will see a benefit for much “heavier” loops only - more iterations/longer computation. Another option, if you’re going to deal mostly with loops at the limit of overhead/benefit of threads, would be https://github.com/JuliaSIMD/Polyester.jl, which decides if it is worth using threads and if yes, how many.

trahflow · December 3, 2021, 11:27am

Also note that you probably want to put your two implementations into functions before benchmarking them, as code in functions is generally much more performant than stuff defined at the top level. See the performance tips section for reference: Performance Tips · The Julia Language

lawless-m · December 3, 2021, 12:32pm

Your parallel version is not doing anything in parallel

I re-arranged the blocks to see what it was doing

You have 100 threads taking it in turn to effectively do lock do_thing() unlock

@time Threads.@threads for i in 1:100
    test_params = NaN
    function test_then_append()
        lock(lk)
        try
            test_params = Sobol.next!(s)
            if test_params[1]<10
                append!(a_ps, [test_params])
                return true
            end
        finally
            unlock(lk)
        end
        return false
    end
    while test_then_append()
end

Marius_123 · January 27, 2022, 10:14am

Hi, sorry haven’t been here a while. Sorry for the late response and thank you for your input. So what i want to basically do is that each thread takes a new element of the Sobol sequence and then does some computation afterwards. I didn’t include these computations here for simplicity. So in this example the parallel part should be appending the parameters to a_ps, while the drawing of te Sobol sequnce should be serial. If this is not what my implementation is doing, how can i do it in parallel?

Minumsand · January 27, 2022, 9:33pm

Thanks @trahflow for the hint and for the Performance Tips link.

Topic		Replies	Views
Multithreading for nested for loops General Usage parallel , multithreading , threads	13	1735	August 16, 2023
Question for lower performance by using @threads in for loop New to Julia question	13	1054	July 9, 2021
Julia Threads.@threads slower than single thread performance Performance multithreading , pde	11	2911	April 24, 2023
Threaded loop far slower than sequential loop (+ high compilation time) Performance multithreading	3	826	September 17, 2021
Simple multi-thread loop with array Performance question , parallel , multithreading	11	765	April 13, 2021

Why is the non parallel loop faster than my parallel loop?

Related topics