Parallelized calls to Optim.optimize use the same number of threads as a single threaded call

adannenberg · October 16, 2024, 3:07pm

To investigate how sample size affects uncertainty in parameter inference, I do something like:

Θ_example = something  #Θ is a vector of the parameters I'm trying to infer
Θvec = zeros(numSamples, size(Θ_example))

for i in 1:numSamples
	s = generateSample()
	Θ = inferParameters(s)
	Θvec[i] = Θ
end

where inferParameters calls on Optim.optimize and currently uses NelderMead()

When I run this on my MacBook Pro with an M1 Pro chip with numThreads=8 I can see that I use one core only and have ~22 threads. I see this on both in the terminal using top and using the Mac’s Activity Monitor.

Naively, I’d like do something like this instead:

Θ_example = something  #Θ is a vector of the parameters I'm trying to infer
Θvec = zeros(numSamples, size(Θ_example))

Threads.@threads for i i in 1:numSamples
	s = generateSample()
	Θ = inferParameters(s)
	Θvec[i] = Θ
end

When I parallelize in this way I do see all my cores in use but it’s still the case that I’m only using ~22 threads spread over the 8 cores and my computation is significantly slower than that single-threaded version.

Finally, my question:
Is this behavior expected and is there a better way to utilize the multiple cores? I would have expected / hoped to have 20-odd threads running on each of the 8 cores rather than the same total number of threads as the single-core case.

Sukera · October 16, 2024, 3:25pm

Unlike its name, Threads.@threads does not spawn new OS threads, but Julia Tasks, that are run on the OS threads julia was spawned with (through e.g. -t 8). If your code is using a BLAS library or similar under the hood, those threads are distinct from the threads julia runs with. So in essence, your solver likely is already internally threaded, and using parallelism on top is not going to meaningfully improve performance, but rather increase contention.

adannenberg · October 16, 2024, 3:52pm

But in the first implementation in which I don’t use Threads.@threads it is the case that all 22 of the threads (presumably from BLAS, as you say) are on a single core. At least I think so… Here’s a snapshot of top

Is there a way to parallelize in which each optimization with its own set of 22ish threads lives on a different core?

Edit: I think I misinterpreted the output of top. man top says that the threads column shows “Number of threads (total/running)” rather than total / number of cores as I thought. Not sure that I believe that because it continues to display 22/1 as in the screenshot but displays 23/8 when I run the version with Threads.@threads

dqeeq · October 16, 2024, 4:30pm

You can try Dagger.jl

So @threads indeed is utilizing more “threads”.

Note that julia has some defaults if you don’t supply -p and -t arguments

adannenberg · October 16, 2024, 4:56pm

I’ll have a look, thanks.

Topic		Replies	Views
BlackBox Optimization: Multiple Cores (Workers) Rather than Threads General Usage package , parallel , optimization	4	1397	May 15, 2020
JuMP slowed down in multithreading Optimization (Mathematical) jump , multithreading , ipopt	2	500	December 30, 2022
Threads.@threads does not work properly General Usage	6	307	July 7, 2024
Parallel computing using Optim General Usage optim , pmap	5	1645	June 25, 2020
# of threads in Optimal Transport Package Performance package	4	327	March 27, 2022

Parallelized calls to Optim.optimize use the same number of threads as a single threaded call

Related topics