Thanks for the detailed explanation of how to implement the shift-invert operation (which I needed!). My code for implementing this is in the file rerun_KrylovKit
show below:
using KrylovKit: eigsolve
using LinearAlgebra
PQ = P*Q
function rerun_KrylovKit(n=1, PQ=PQ)
Threads.nthreads() == n || error("Restart Julia with -t$n")
BLAS.set_num_threads(n)
println("KrylovKit $n Threads:")
@time begin
LUfactors = lu(PQ - ev*I)
fun(x) = LUfactors \ x
D2, Exyvecs, info = eigsolve(fun, rand(eltype(PQ), size(PQ,2)), NMODES, :LM; tol=1e-14)
end
info.converged ≥ NMODES || warning("$(info.converged) converged modes, but $(NMODES) requested")
D2 = inv.(D2[1:NMODES]) .+ ev
@show D2
print(info)
end
Running this on single-threaded Julia produces the following output:
julia> rerun_KrylovKit(1)
KrylovKit 1 Threads:
2.402129 seconds (92.02 k allocations: 423.060 MiB, 0.77% gc time, 3.03% compilation time)
D2 = ComplexF64[-11.41757523508308 - 0.0im, -11.34764322859855 - 0.0im, -11.175322773264579 - 0.0im]
ConvergenceInfo: 5 converged values after 2 iterations and 42 applications of the linear map;
norms of residuals are given by (4.651212075017882e-28, 1.1311489902029288e-25, 1.1303118924642023e-20, 6.12279637828794e-19, 3.090439836645266e-16).
The results are much improved over my initial try, with convergence achieved and execution time competitive with Arpack. I also tried running it with 8 threads and with MKLSparse:
using MKL, MKLSparse
julia> using MKL, MKLSparse
julia> rerun_KrylovKit(8)
KrylovKit 8 Threads:
2.364362 seconds (1.49 k allocations: 418.421 MiB, 0.80% gc time)
D2 = ComplexF64[-11.41757523508308 - 0.0im, -11.34764322859856 - 0.0im, -11.175322773264574 - 0.0im]
ConvergenceInfo: 5 converged values after 2 iterations and 42 applications of the linear map;
norms of residuals are given by (4.367435621113895e-28, 1.0519282546009974e-25, 1.0801634314205764e-20, 5.526478130311956e-19, 2.822258996060205e-16).
Almost no difference in run time–the difference is within the variation seen for multiple runs.
I went through a similar exercise with ArnoldiMethod
, using the LinearMap
wrapper shown in the package’s documentation. The contents of rerun_ArnoldiMethod.jl
are shown below:
using ArnoldiMethod, LinearAlgebra, LinearMaps
PQ = P*Q
# Factorizes A and builds a linear map that applies inv(A) to a vector.
function construct_linear_map(A)
F = factorize(A - ev*I)
LinearMap{eltype(A)}((y, x) -> ldiv!(y, F, x), size(A,1), ismutating=true)
end
function rerun_ArnoldiMethod(n=1, PQ=PQ)
Threads.nthreads() == n || error("Restart Julia with -t$n")
BLAS.set_num_threads(n)
println("ArnoldiMethod $n Threads:")
@time begin
decomp, history = partialschur(construct_linear_map(PQ), nev=NMODES, tol=1e-14, restarts=300, which=LM())
D2, Exy = partialeigen(decomp)
end
println(history)
D2 = inv.(D2) .+ ev
@show D2
end
Running this on a single thread produces the following output:
julia> include("rerun_ArnoldiMethod.jl")
rerun_ArnoldiMethod (generic function with 3 methods)
julia> rerun_ArnoldiMethod(1)
ArnoldiMethod 1 Threads:
2.216459 seconds (1.69 M allocations: 447.162 MiB, 0.94% gc time, 10.12% compilation time)
Converged: 3 of 3 eigenvalues in 38 matrix-vector products
D2 = [-11.175322773264579, -11.347643228598555, -11.41757523508308]
3-element Vector{Float64}:
-11.175322773264579
-11.347643228598555
-11.41757523508308
and with 8 threads (restart of Julia with 8 threads not shown):
julia> using MKL, MKLSparse
julia> rerun_ArnoldiMethod(8)
ArnoldiMethod 8 Threads:
1.997144 seconds (240 allocations: 367.015 MiB, 0.76% gc time)
Converged: 3 of 3 eigenvalues in 38 matrix-vector products
D2 = [-11.17532277326458, -11.347643228598553, -11.41757523508308]
3-element Vector{Float64}:
-11.17532277326458
-11.347643228598553
-11.41757523508308
so the 8-threaded run with MKLSparse
is about 10% faster than the single-threaded run.
Here is an updated summary of the timings with and without multi-threading. “With” also includes use of MKL
and MKLSparse
for the Julia results.
Package |
Function |
1 Thread Time (s) |
8 Threads Time (s) |
Matlab |
eigs |
1.0 |
1.0 |
Arpack |
eigs |
2.6 |
2.8 |
KrylovKit |
eigsolve |
2.4 |
2.4 |
ArnoldiMethod |
partialschur, partialeigen |
2.2 |
2.0 |
I only reported the times to the nearest tenth of a second because they vary by almost that much from run to run (due to random starting vectors?). Only the ArnoldiMethod
routines show even a modest reduction in execution time with threading and use of MKLSparse
. Arpack
actually got slightly worse. So far, the shortest time I can get from Julia is using ArnoldiMethod
which takes about twice as long as eigs
in Matlab.