Transformations in ArnoldiMethod.jl

I have a question about the documentation of “transformations” (i.e., multiplying the inverse of a matrix times a vector without explicitly forming the inverse) in ArnoldiMethod.jl, although I think the topic is of broader interest than just this package.

From the documentation:

using ArnoldiMethod, LinearAlgebra, LinearMaps

# Define a matrix whose eigenvalues you want
A = rand(100,100)

# Factorizes A and builds a linear map that applies inv(A) to a vector.
function construct_linear_map(A)
    F = factorize(A)
    LinearMap{eltype(A)}((y, x) -> ldiv!(y, F, x), size(A,1), ismutating=true)
end

# Target the largest eigenvalues of the inverted problem
decomp, = partialschur(construct_linear_map(A), nev=4, tol=1e-5, restarts=100, which=LM())
λs_inv, X = partialeigen(decomp)

However, I would replace the definition of construct_linear_map and the call to partial_schur with the somewhat simpler

F = factorize(A)
linearmap = LinearMap{eltype(A)}((y, x) -> ldiv!(y, F, x), size(A,1), ismutating=true)
decomp, = partialschur(linearmap, nev=4, tol=1e-5, restarts=100, which=LM())

The example for a generalized eigenvalue problem is more complicated:

using ArnoldiMethod, LinearAlgebra, LinearMaps

# Define the matrices of the generalized eigenvalue problem
A, B = rand(100,100), rand(100,100)

struct ShiftAndInvert{TA,TB,TT}
    A_lu::TA
    B::TB
    temp::TT
end

function (M::ShiftAndInvert)(y,x)
    mul!(M.temp, M.B, x)
    ldiv!(y, M.A_lu, M.temp)
end

function construct_linear_map(A,B)
    a = ShiftAndInvert(factorize(A),B,Vector{eltype(A)}(undef, size(A,1)))
    LinearMap{eltype(A)}(a, size(A,1), ismutating=true)
end

# Target the largest eigenvalues of the inverted problem
decomp,  = partialschur(construct_linear_map(A, B), nev=4, tol=1e-5, restarts=100, which=LM())
λs_inv, X = partialeigen(decomp)

I would simplify this code by replacing the definitions of ShiftAndInvert and construct_linear_map and the call to partialschur with

F = factorize(A)
linearmap = LinearMap{eltype(A)}((y, x) -> ldiv!(y, F, B * x), size(A,1), ismutating=true)
decomp, = partialschur(linearmap, nev=4, tol=1e-5, restarts=100, which=LM())

When I use my approach in my own work (full disclosure: not these eigenvalue problems specifically), it seems to work fine (e.g., @code_warntype doesn’t report any type instabilities). So why what appears to me more complicated code?

Welcome to the discourse :slight_smile:

In the first example, I don’t understand why you say that your version is simpler. To me the code does the very same thing, except that defining a function to construct the LinearMap has two advantages:

  1. You don’t need to recall all the things you put into the LinearMap everytime you want to find some eigenvalues of a matrix
  2. Your solution should be a bit slower if it lives in global scope since you repeatedly access F and that is slow. If it lives inside a larger function and you override F later, you could still run into performance issues due to captured variables. Which is much more unlikely if it is wrapped up like in the example code.

Your second solution has the same “maybe-problems” with repeated access to F and B and on top of that causes allocations on every application of the LinearMap because B*x allocates a new vector. The example avoids that by preallocating some memory and using it to store the result of the multiplication with mul!.

You are of course right, that there are different way of achieving the same thing and what is “easiest” or “more complicated” might not be the same for everyone. To me both codes read quite clear. The example code just tied up the logic into a type and neat construction function for the LinearMap. E.g. you can have the same thing without a struct:

linearmap = let F = factorize(A), 
    B = B, # to avoid performance issues from captured variables
    temp = Vector{eltype(A)}(undef, size(A,1)
    LinearMap{eltype(A)}((y, x) -> ldiv!(y, F, mul!(temp, B, x)), size(A,1), ismutating=true)
end
decomp, = partialschur(linearmap, nev=4, tol=1e-5, restarts=100, which=LM())

Thanks for the welcome!

All your points are good. For what it’s worth, I thought there might be a global scope/capture variable issue, but in my own code all of this is buried in a function and @code_warntype seemed happy. What I didn’t realize, though, that B*x allocates; thanks for pointing that out.