Multithreading a loop with ARPACK eigs()

dehond · July 14, 2020, 10:28am

I’m attempting to multithread a calculation in which I loop over a function that calls on ARPACK’s eigs(). However, whenever I insert the Threads.@threads macro in front of the loop I either get an error or Julia crashes altogether. The following MWE reproduces this behavior on my system:

using Arpack
Threads.@threads for i in 1:10
    λ, ϕ = eigs(rand(500, 500), maxiter = 1000)
end

If it makes things any clearer, I get the following stacktrace:

ERROR: TaskFailedException:
ARPACKException: unspecified ARPACK error: -9999
Stacktrace:
 [1] aupd_wrapper(::Type{T} where T, ::Arpack.var"#matvecA!#24"{Array{Float64,2}}, ::Arpack.var"#18#25", ::Arpack.var"#19#26", ::Int64, ::Bool, ::Bool, ::String, ::Int64, ::Int64, ::String, ::Float64, ::Int64, ::Int64, ::Array{Float64,1}) at C:\Users\Julius\.julia\packages\Arpack\o35I5\src\libarpack.jl:76
 [2] _eigs(::Array{Float64,2}, ::LinearAlgebra.UniformScaling{Bool}; nev::Int64, ncv::Int64, which::Symbol, tol::Float64, maxiter::Int64, sigma::Nothing, v0::Array{Float64,1}, ritzvec::Bool) at C:\Users\Julius\.julia\packages\Arpack\o35I5\src\Arpack.jl:181
 [3] #eigs#10 at C:\Users\Julius\.julia\packages\Arpack\o35I5\src\Arpack.jl:46 [inlined]
 [4] #eigs#9 at C:\Users\Julius\.julia\packages\Arpack\o35I5\src\Arpack.jl:45 [inlined]
 [5] macro expansion at .\REPL[2]:2 [inlined]
 [6] (::var"#2#threadsfor_fun#3"{UnitRange{Int64}})(::Bool) at .\threadingconstructs.jl:61
 [7] (::var"#2#threadsfor_fun#3"{UnitRange{Int64}})() at .\threadingconstructs.jl:28
Stacktrace:
 [1] wait(::Task) at .\task.jl:267
 [2] top-level scope at .\threadingconstructs.jl:69

The ARPACKException changes from time to time, I’ve also seen 1 and 3. I get the sense ARPACK is not suitable for multithreading like this, but I don’t really understand why, and I’m curious if there’s a workaround.

This is using Julia 1.4.2 and Arpack 0.4.0, running on a Windows system.

platawiec · July 14, 2020, 11:59am

My understanding is that Julia’s threading system doesn’t necessarily play nicely with others, which isn’t so surprising given all that threads have to orchestrate.

Is using a different parallel approach an option for you? I would recommend trying the above with Distributed instead. That should work fine.

samuelpowell · July 14, 2020, 12:28pm

The underlying implementation is not thread safe, so undefined behaviour is to be expected. I highly recommend @stabbles ArnoldiMethod.jl as an alternative.

dehond · July 14, 2020, 3:31pm

Thank you for your suggestions. I’ve tried to set it up using Distributed. However, if I run Julia using julia -p auto (and check that numworkers() returns 4), and then execute the following:

@time @sync @distributed for i in 1:10
	λ, ϕ = eigs(rand(500, 500), maxiter = 1000)
end

I don’t get any speedup whatsoever compared to running it without the @distributed macro. I verified that Distributed does work by doing a dummy test where I evaluate sum(rand(Bool, 1000000)) up to a thousand times. Then I get a factor of two improvement compared to serialized execution.

Bruno_Amorim · July 15, 2020, 12:07am

eigs is itself multithreaded. So in a multicore machine all the processors will be used even without using julia’s Threads or Distributed.

Using Distributed, it is possible that in each julia process, BLAS is spawing its own threads which can compete for resources with the BLAS threads from the other processes. You can try to make
@everywhere BLAS.set_num_threads(1),
so that in each julia process there is only one BLAS thread. Even then, it is not guaranteed that you will see an improvement: it will depend on the size of the matrices and how many you are diagonalizing. In some cases, it might be better to not use Distributed and have eigs use all the cores of your machine.

samuelpowell · July 15, 2020, 7:49am

To be more precise, it is the ARPACK callback routines which are multithreaded (using the normal Julia infrastructure, incl. broadcast and BLAS) - the underlying library is not. As you point out, the consequences of this are problem size dependent. Profiling is required, but roughly speaking:

For large matrices, the optimum approach will be just to call eigs on each matrix in turn, allowing BLAS to saturate the processor. This could be combined with a distributed approach over multiple machines.
For small matrices, where initialisation and ARPACK itself constitute proportionally more work, then one can either take the distributed approach discussed (limit BLAS to single core, distribute over multiple processes and loop over a subset), or use a fully multithreaded library such as that I linked to previously. I have found the latter approach to be very effective when looking for a small number of eigenvalues (which I assume is the intent of the OP given the call to eigs).

If the sample code is truly representative of the problem (e.g., dense matrices of size 500 x 500), and a GPU is available, it might also be worth profiling a full dense factorisation using the batched routines provided by CuSolver. I am unsure if there is a simple interface provided by CUDA.jl but the routines are wrapped and can be called with a little work.

dehond · July 15, 2020, 9:30am

Thank you for all the suggestions.

@samuelpowell my matrix is actually quite large, 213444x213444 elements, albeit sparse. With your and @Bruno_Amorim’s remarks I think it makes sense that there was no improvement in performance. I have tried the ArnoldiMethod.jl package you mentioned, and this actually improved the performance by quite a bit, so thank you for the recommendation!

Topic		Replies	Views
Parallel computing for eigs() function Julia at Scale	10	2750	April 14, 2021
Eigs on huge SparseArrays doesn't use all BLAS threads Performance	7	688	November 23, 2020
How can I give a function to eigs() in julia like matlab General Usage question , package , matlab , functions , arpack	6	555	November 3, 2023
Some eigenpairs from a large, sparse, nonsymmetric matrix: Julia vs Matlab Numerics sparse , eigenvalues , eigenvectors	22	1806	January 12, 2024
Issue Arpack computing eigs General Usage question , package	1	668	January 25, 2023

Multithreading a loop with ARPACK eigs()

Related topics