Thanks for the detailed explanation of how to implement the shift-invert operation (which I needed!). My code for implementing this is in the file `rerun_KrylovKit`

show below:

```
using KrylovKit: eigsolve
using LinearAlgebra
PQ = P*Q
function rerun_KrylovKit(n=1, PQ=PQ)
Threads.nthreads() == n || error("Restart Julia with -t$n")
BLAS.set_num_threads(n)
println("KrylovKit $n Threads:")
@time begin
LUfactors = lu(PQ - ev*I)
fun(x) = LUfactors \ x
D2, Exyvecs, info = eigsolve(fun, rand(eltype(PQ), size(PQ,2)), NMODES, :LM; tol=1e-14)
end
info.converged â‰Ą NMODES || warning("$(info.converged) converged modes, but $(NMODES) requested")
D2 = inv.(D2[1:NMODES]) .+ ev
@show D2
print(info)
end
```

Running this on single-threaded Julia produces the following output:

```
julia> rerun_KrylovKit(1)
KrylovKit 1 Threads:
2.402129 seconds (92.02 k allocations: 423.060 MiB, 0.77% gc time, 3.03% compilation time)
D2 = ComplexF64[-11.41757523508308 - 0.0im, -11.34764322859855 - 0.0im, -11.175322773264579 - 0.0im]
ConvergenceInfo: 5 converged values after 2 iterations and 42 applications of the linear map;
norms of residuals are given by (4.651212075017882e-28, 1.1311489902029288e-25, 1.1303118924642023e-20, 6.12279637828794e-19, 3.090439836645266e-16).
```

The results are much improved over my initial try, with convergence achieved and execution time competitive with Arpack. I also tried running it with 8 threads and with MKLSparse:

```
using MKL, MKLSparse
julia> using MKL, MKLSparse
julia> rerun_KrylovKit(8)
KrylovKit 8 Threads:
2.364362 seconds (1.49 k allocations: 418.421 MiB, 0.80% gc time)
D2 = ComplexF64[-11.41757523508308 - 0.0im, -11.34764322859856 - 0.0im, -11.175322773264574 - 0.0im]
ConvergenceInfo: 5 converged values after 2 iterations and 42 applications of the linear map;
norms of residuals are given by (4.367435621113895e-28, 1.0519282546009974e-25, 1.0801634314205764e-20, 5.526478130311956e-19, 2.822258996060205e-16).
```

Almost no difference in run timeâ€“the difference is within the variation seen for multiple runs.

I went through a similar exercise with `ArnoldiMethod`

, using the `LinearMap`

wrapper shown in the packageâ€™s documentation. The contents of `rerun_ArnoldiMethod.jl`

are shown below:

```
using ArnoldiMethod, LinearAlgebra, LinearMaps
PQ = P*Q
# Factorizes A and builds a linear map that applies inv(A) to a vector.
function construct_linear_map(A)
F = factorize(A - ev*I)
LinearMap{eltype(A)}((y, x) -> ldiv!(y, F, x), size(A,1), ismutating=true)
end
function rerun_ArnoldiMethod(n=1, PQ=PQ)
Threads.nthreads() == n || error("Restart Julia with -t$n")
BLAS.set_num_threads(n)
println("ArnoldiMethod $n Threads:")
@time begin
decomp, history = partialschur(construct_linear_map(PQ), nev=NMODES, tol=1e-14, restarts=300, which=LM())
D2, Exy = partialeigen(decomp)
end
println(history)
D2 = inv.(D2) .+ ev
@show D2
end
```

Running this on a single thread produces the following output:

```
julia> include("rerun_ArnoldiMethod.jl")
rerun_ArnoldiMethod (generic function with 3 methods)
julia> rerun_ArnoldiMethod(1)
ArnoldiMethod 1 Threads:
2.216459 seconds (1.69 M allocations: 447.162 MiB, 0.94% gc time, 10.12% compilation time)
Converged: 3 of 3 eigenvalues in 38 matrix-vector products
D2 = [-11.175322773264579, -11.347643228598555, -11.41757523508308]
3-element Vector{Float64}:
-11.175322773264579
-11.347643228598555
-11.41757523508308
```

and with 8 threads (restart of Julia with 8 threads not shown):

```
julia> using MKL, MKLSparse
julia> rerun_ArnoldiMethod(8)
ArnoldiMethod 8 Threads:
1.997144 seconds (240 allocations: 367.015 MiB, 0.76% gc time)
Converged: 3 of 3 eigenvalues in 38 matrix-vector products
D2 = [-11.17532277326458, -11.347643228598553, -11.41757523508308]
3-element Vector{Float64}:
-11.17532277326458
-11.347643228598553
-11.41757523508308
```

so the 8-threaded run with `MKLSparse`

is about 10% faster than the single-threaded run.

Here is an updated summary of the timings with and without multi-threading. â€śWithâ€ť also includes use of `MKL`

and `MKLSparse`

for the Julia results.

Package |
Function |
1 Thread Time (s) |
8 Threads Time (s) |

Matlab |
eigs |
1.0 |
1.0 |

Arpack |
eigs |
2.6 |
2.8 |

KrylovKit |
eigsolve |
2.4 |
2.4 |

ArnoldiMethod |
partialschur, partialeigen |
2.2 |
2.0 |

I only reported the times to the nearest tenth of a second because they vary by almost that much from run to run (due to random starting vectors?). Only the `ArnoldiMethod`

routines show even a modest reduction in execution time with threading and use of `MKLSparse`

. `Arpack`

actually got slightly worse. So far, the shortest time I can get from Julia is using `ArnoldiMethod`

which takes about twice as long as `eigs`

in Matlab.