Matrix-Free Eigensolver in BifurcationKit.jl

I have a large-scale continuation problem for which I’m considering using matrix-free methods in Bifurcationkit.jl. To evaluate stability and detect bifurcations, I implemented a matrix-free, shift-inverting eigensolver in the same fashion as the Swift-Hohenberg 3D tutorial, but I notice that it’s considerably slower than a matrix-based one. For instance, if I run the following lines of code in the tutorial,

using Arpack
 @time for i = 1:5 eigSH3d(dx -> L1 * dx, 3) end
 @time for i = 1:5 Arpack.eigs(L1; nev = 3, sigma = 0.1) end

the 2nd option is nearly twice as fast. Furthermore, because it seems as though the eigenvalue computation is the slowest step in computing the bifurcation diagram, matrix-based methods seem to be faster than matrix-free versions. I tested this by running the following lines of code.

#Jacobian
function J_sh(u, p)
	@unpack l, ν, L1 = p
	return -L1  .+ spdiagm(0 => l .+ 2 .* ν .* u .- 3 .* u.^2)
end

prob = BK.BifurcationProblem(F_sh, AF(vec(sol0)), par, (@lens _.l),
	J = J_sh,
	recordFromSolution = (x, p) -> (n2 = norm(x), n8 = norm(x, 8)),
	issymmetric = true)

#Newton options with new eigsolver
optnew = NewtonPar(verbose = true, tol = 1e-8, maxIter = 20)
	@set! optnew.eigsolver = EigArpack(0.1, :LM, tol = 1e-12)
	sol_hexa = @time newton(prob, optnew)
	println("--> norm(sol) = ", norm(sol_hexa.u, Inf64))

#Continuation parameters
optcont = ContinuationPar(dsmin = 0.0001, dsmax = 0.005, ds= -0.001, pMax = 0.15, pMin = -.1, newtonOptions = setproperties(optnew; tol = 1e-9, maxIter = 15), maxSteps = 146, detectBifurcation = 3, nev = 15, nInversion = 4, plotEveryStep  = 1)

#Continuation problem
br = @time continuation(
	reMake(prob, u0 = 0prob.u0), PALC(tangent = Bordered(), bls = BorderingBLS(solver = optnew.linsolver, checkPrecision = false)), optcont;
	plot = false, verbosity = 1,
	normC = norminf,
	event = BK.FoldDetectEvent,
	# finaliseSolution = (z, tau, step, contResult; k...) -> begin
	# if length(contResult) == 1
	# 	pretty_table(contResult.branch)
	# else
	# 	pretty_table(contResult.branch, overwrite = true,)
	# end
	# true
# end
	)

As far as I can tell, the main source of slowness is in the so-called “inner loop” required to shift and invert the Jacobian. My understanding is that, in Arpack.jl, the Jacobian is factorized once and then the system is solved directly using \ during each Arnoldi iteration, but the matrix-free version solves the system using a preconditioned linear solver during each iteration, which scales worse with increasing iterations. Here’s some code that demonstrates this

# Matrix-based procedure of Arpack.eigs - computes appropriate factorization
# and passes \ to inner Arnoldi iteration
t1 = time()
P = factorize(L1)
u0 = vec(sol0)
for i = 1:20 Plu \ u0 end
t2 = time()
dt_M = t2 - t1

# Matrix-free procedure - uses preconditioned linear solver
# in inner arnoldi iteration
t3 = time()
for i = 1:20 ls(dx -> L1 * dx, u0) end
t4 = time()
dt_MF = t4 - t3
print("Matrix-free: "*string(dt_MF)*" s")
print("Matrix-based: "*string(dt_M)*" s")
Matrix-free: 1.9449999332427979 s
Matrix-based: 1.0419998168945312 s

Thus, I imagine that the way to optimize this is to either find some way to speed up the “inner” solution (i.e. fine-tune the linear solver somehow) or to find some alternative to the shift-invert method for computing interior eigenvalues (i.e. one that bypasses the need for an inner loop).

Has anyone else dabbled with this sort of thing? Your help would be greatly appreciated :grin:.

You are comparing completely different software packages. A factor of 2 could easily just be due to differences in algorithm. e.g. they could have different restart algorithms, or even just different convergence tolerances? It would probably be better to call KrylovKit.eigsolve directly for your comparison rather than switching to Arpack.

Isn’t the default :LM, i.e. largest-magnitude eigenvalues?

Oh, I see that passing a shift parameter to Arpack.eigs switches it to finding the closest eigenvalues to shift by shift-and-invert, in which case it indeed performs a matrix factorization once at the beginning. And I see that eigSH3d is actually passing a GMRES solve to KrylovKit.eigsolve. In that case, I’m surprised that the difference is only a factor of 2.

Depends a lot on your matrix. How big, how sparse, etcetera? Is it Hermitian? Where in the spectrum are you looking for eigenvalues?

2 Likes

Thanks for your reply!

In the tutorial, the Jacobian is a 10648 x 10648 sparse (246,136 stored entries for a sparsity of about 0.2%), Hermitian and penta-diagonal matrix. Because the objective is to identify bifurcations, I need to compute eigenvalues with real part close to 0. This is not the case for the problem I’m considering (non-Hermitian and of a similar size with about 0.2% sparsity), so I understand that not everything that applies to the tutorial will apply to my specific case. I can provide more details if it helps.

I am not sure I understand what your problem is. If you want the fastest eig solver for your problem, it depends on a lot of factors, and from my experience, you can only try to know. Also, there are not many MF eigensolver for non hermitian linear maps in Julia, e.g. KrylovKit

1 Like

Sure - my understanding of the philosophy of your tutorial is that, for the large problem under consideration, matrix-free methods provide a significant speed boost because they avoid the need for memory-intensive matrix factorization and direct solvers (indeed, setting detectBifurcation = 0 in optcont results in a significant speed-up). The downside I’m encountering is that, if one desires to evaluate stability and detect bifurcations using these methods, one needs a matrix-free eigensolver as well (one that uses shift-invert to robustly compute interior eigenvalues). However, I’m finding that the increased cost of these solvers outweighs the speed-up in pseudo-arclength continuation afforded by matrix-free methods, rendering these methods unviable (at least with my admittedly limited experience in constructing such a solver). Thus, I’m wondering if there’s a way I can optimize my eigensolver so that this is no longer the case.

I acknowledge that this could be a pipe dream, in which case I will just use matrix-based methods, but I want to cover all my bases and ask the community for their insights before I pursue these. I also acknowledge that this is a highly problem-specific issue, which is why I also welcome any general pieces of advice or methods I can explore for certain specific problems. Even understanding how to better optimize the tutorial would be hugely informative to me.

I agree that it is misleading in the tutorial and I will change it. For large n, matrix-free is the only option. For small n, direct methods are often faster. You might be using a dimension somewhat in the middle

1 Like

Sure, but isn’t this just avoiding the scalability problem? For instance, if I set Nx = Ny = Nz = 33 in the tutorial, the following lines of code

using Arpack
 @time eigSH3d(dx -> L1 * dx, 3) end
 @time Arpack.eigs(L1; nev = 3, sigma = 0.1) end

produce the following output

241.701766 seconds (19.56 M allocations: 11.060 GiB, 0.28% gc time, 2.04% compilation time)
 63.166971 seconds (5.44 k allocations: 1.135 GiB, 0.15% gc time)

As such, the eigenvalue problem becomes even more of a bottleneck in terms of memory, making this scale even worse for large N. To me, this completely invalidates the point of using matrix-free methods for problems in which you need to detect bifurcations, for which an efficient eigensolver that can compute interior eigenvalues is needed.

I’d like to think that this isn’t always the case - for instance, this paper bypasses the need for the memory-intensive inner loop for shift-inversion by using spectral fold methods and a LOBPCG eigensolver, which I imagine would work well here because the Jacobian is Hermitian. I’d like to get some wisdom from the community on this issue before I revert to matrix-based methods.

maybe the inner linsolve should reuse the Krylov subspace, like in Krylov.jl. Right now it is written in out of place manner.

This is a tiny sparse matrix by the standards of numerical linear algebra. I would definitely use a direct sparse method in such a case, with a precomputed factorization, rather than a matrix-free method (unless you happen to have an extremely good preconditioner for your problem). Iterative linear solvers are much more of a headache to get working well, and are seldom worth it for relatively small sparse matrices.

Note that if you do the factorization yourself (rather than relying on Arpack internals), you can use the lu! function to re-use the symbolic factorization and storage if you change the matrix entries but keep the same sparsity pattern.

1 Like

You can use spectral fold (A - \mu I)^2 instead of shift-and-invert (A - \mu I)^{-1} with any Krylov method, whether it is LOBPCG or Arnoldi/Lanczos, to convert interior eigenvalues to extremal ones. However, in my experience you often pay a big price in convergence rates when you square the matrix like this.

(Aside: we’ve also found spectral folding to be useful for localization proofs, since it gives you a way to get eigenvalue bounds in spectral gaps.)

That’s great advice - thank you! I suppose if my Jacobian were more dense then this might be worth the headache, but it seems to be worthwhile to use sparse linear algebra in this case.

Pre-factorizing and using lu! is a great idea - I’ll try that out.

EDIT: Can confirm that this is faster:

f_sym = lu(L1)
struct SH3dEigMB{Tf, Tσ} <: BK.AbstractEigenSolver
    σ::Tσ
    f::Tf
end
function (sheig::SH3dEigMB)(J, nev::Int; verbosity = 0, kwargs...)
    σ = sheig.σ
    f = sheig.f
    nv = 30
    fnew = lu!(f, J - UniformScaling(σ))
    Jmap = LinearMap{Float64}(dx -> fnew \ dx, nothing, N, N; issymmetric = true)
    vals, vecs, info = Arpack.eigs(Jmap; nev = nev, ncv = nv + nev)
    vals2 = 1 ./vals .+ σ
    Ind = sortperm(vals2, by = real, rev = true)
    return vals2[Ind], vecs[Ind], true, info
end
eig = SH3dEigMB(0.0, f_sym)
 
@time for i = 1:5 eig(JJ, 3) end
@time for i = 1:5 eigs(JJ; nev = 3, sigma = 0.0, ncv = 33) end
7.483051 seconds (3.01 k allocations: 1.200 GiB, 0.41% gc time)
17.722156 seconds (4.69 k allocations: 823.890 MiB)

I like this idea, but I’m having trouble finding any documentation to guide my implementation of this strategy - could you please elaborate or direct me to any useful docs?