Parallel computing using Optim

Joris_Pinkse · June 23, 2020, 5:28pm

Hello,

I’m running the program below on a 32 cpu/64 thread system without much of anything else running on it. If I use anything beyond 16 cores then the execution time in the second run is effectively flat. This is true both when I using a precompiled system image and when I don’t (though a bit more so when using a precompiled system image for reasons I don’t understand). What am I missing?

using Distributed 
@everywhere using Optim, LinearAlgebra

@everywhere const R = 8000
@everywhere const d = 40

@everywhere function once(x::Int64)
    for r = 1: ((x<0) ? 1 : R)
        function Ω(θ::Vector{Float64})::Float64
            dot(θ,θ) * 0.5
        end
        function dΩ!(g::Vector{Float64}, θ::Vector{Float64})
            g[:] = θ
        end
        function ddΩ!(H::Matrix{Float64}, θ::Vector{Float64})
            H[:,:] .= 0.0
            for i = 1:d
                H[i,i] = 1.0
            end
        end
        Optim.optimize(Ω::Function, dΩ!::Function, ddΩ!::Function, ones(Float64,d), NewtonTrustRegion())
    end
end

function doit()
    @time pmap(once, -64:-1)
    @time pmap(once, 1:192)
end

doit()

s-broda · June 24, 2020, 8:31am

Could this be because dot calls BLAS, and Julia is built with OpenBlas limited to use 16 threads, see https://github.com/JuliaLang/julia/blob/master/deps/blas.mk?

If this is the case, then you could try writing the dot call as an explicit loop, or using MKL.jl, or building OpenBlas with more threads.

Joris_Pinkse · June 24, 2020, 1:48pm

Thank you. This is indeed related, though the cause is different.

If I run things in parallel and OpenBLAS also runs things in parallel in each of the processes that I have going then I’m effectively using many more threads than I had indicated. Perhaps I should compile a separate version from source with a maximum of one thread to use it like I had intended.

And curiously, my OpenBLAS seems to max out at 8 threads. Hmmm.

platawiec · June 24, 2020, 1:59pm

This is a bit of speculation, but it could be that threaded BLAS is simply not enabled. See the default keyword argument enable_threaded_blas = false for addprocs, documentation here: Distributed Computing · The Julia Language

But, because you don’t explicitly call addprocs, I’m not sure of the default behavior.

rdeits · June 24, 2020, 2:05pm

You can control the number of threads used by BLAS at run-time with the BLAS.set_num_threads function from LinearAlgebra:

julia> using LinearAlgebra

julia> LinearAlgebra.BLAS.set_num_threads(1)

Joris_Pinkse · June 25, 2020, 3:55pm

Thanks! I didn’t know that was possible.

Topic		Replies	Views
Regarding the multithreaded performance of OpenBLAS Performance blas , multithreading	7	5433	January 31, 2022
Ideal number of BLAS threads General Usage blas , multithreading , linearalgebra	10	4413	April 27, 2022
BLAS performance testing for Julia 1.8 Performance blas , multithreading	30	8080	July 19, 2022
BLAS fails in Julia's multithreaded mode with too many threads General Usage question , blas , hpc	4	1365	February 15, 2017
How to prevent BLAS from thrashing with Julia? General Usage parallel	5	2188	May 30, 2017

Parallel computing using Optim

Related topics