[ANN] FastLapackInterface.jl v1.0.0: Non-allocating LAPACK factorizations

The DynareJulia team and I are excited to announce the first major release of FastLapackInterface.jl. The package facilitates the pre-allocation of workspaces to be used with some of the most important LAPACK functions with respect to factorizations. This eliminates all major sources of allocations inside the LAPACK functions. The package for now targets QR, Schur and LU factorizations, but functionality may be added in the future.

We have designed the package to be as consistent with the LAPACK functionality in Base julia, making it highly transparent. The QRWs workspace for example can be used with LinearAlgebra.QR as follows:

julia> A = [1.2 2.3
            6.2 3.3]
2Ă—2 Matrix{Float64}:
 1.2  2.3
 6.2  3.3

julia> ws = QRWs(A)
QRWs{Float64}
work: 64-element Vector{Float64}
Ď„: 2-element Vector{Float64}

julia> t = QR(LAPACK.geqrf!(A, ws)...)
QR{Float64, Matrix{Float64}, Vector{Float64}}
Q factor:
2Ă—2 QRPackedQ{Float64, Matrix{Float64}, Vector{Float64}}:
 -0.190022  -0.98178
 -0.98178    0.190022
R factor:
2Ă—2 Matrix{Float64}:
 -6.31506  -3.67692
  0.0      -1.63102

julia> Matrix(t)
2Ă—2 Matrix{Float64}:
 1.2  2.3
 6.2  3.3

The package is fully tested and documented, and a suite of benchmarks to compare with Base julia LAPACK functions is included.

All the best,
Louis

39 Likes

Thanks a lot for this package! It would be very useful in my applications where I need LAPACK operations for many small matrices.

I have two questions:

  1. Do you have a plan to integrate more functions? Specifically, I use Hermitian diagonalization (syev / heev) a lot.

  2. If I understood the code correctly, currently the non-allocating function throws if the workspace is too small. Would it be possible to have an option to just resize the workspace and run?

2 Likes
  1. Yes! In my own work I also mostly run diagonalizations, and I have implemented in the past something similar to a Workspace for it. We are totally open to adding more functionality, and this would be the first one we’d start with.
  2. You are correct. It’s not a bad idea to indeed support resizing. If you want, could you open an issue on it to further discuss what would be the best way to do this? I’m not sure if doing that “under the hood” is the best idea since maybe that would lead to some unexpected behaviors for users, i.e. too much magic. Especially considering that for some of the factorizations work buffers are not trivially determinable but rather require a LAPACK call themselves (maybe looking at what LAPACK does to figure out the size and doing it manually in julia is possible though). I think just having a resize! function that would take in a new size or matrix would be a good idea.
1 Like

LinearAlgebra functions often involve two LAPACK calls, one to determine the workspace size and the second to actually perform the calculation after resizing the workspace. Having the resizing step built in might not be too surprising.

On another note, I guess pre-allocating a workspace will not be thread-safe?

The whole point of the package is to not have allocations occur during the LAPACK calls.

Why not? Using the same workspace with multiple threads of course won’t be indeed.

I see. My (erroneous) impression was that the package allows one to pre-allocate output matrices, and isn’t fastidious about allocations in resizing the workspace. Thanks for clearing this up!

Yes, I was referring to using the same workspace from multiple threads, which seems unsafe. I guess the solution is to allocate a separate workspace for each thread, and have each task fetch a workspace from a pool.

No problem!

Exactly!

Hi All,

We’re happy to announce an update to FastLapackInterface now including Eigen decompositions!

We added 3 new Workspaces:

  • EigenWs
  • HermitianEigenWs
  • GeneralizedEigenWs

We also came up with a more uniform API that allows to construct the correct Workspace depending on the target LAPACK function, for example:

ws = Workspace(LAPACK.getrf!, A; kwargs...)
factorize!(ws, A)

will automatically create a QRWYWs and call getrf! through factorize!.

Let us know what you think!

Cheers

9 Likes

Hi All,

We have another update to announce, FastLapackInterface version 1.2.0.

Cholesky and BunchKaufman related Workspaces were added, and now each Workspace can be resized using resize!. Automatic resizing functionality while calling the LAPACK functions or factorize!/decompose! is turned on by default, but can be controlled with the resize keyword argument.

i.e.

A = rand(3,3)
ws = Workspace(LAPACK.getrf!, A; kwargs...)
B = rand(4,4)
factorize!(ws, B)

will automatically resize ws appropriately. If instead factorize!(ws, B, resize=false) were used, an error would be thrown.

If you have any feedback let us know!

Cheers,
Louis

5 Likes

Is there an example how to use it in the context of solving a linear system.

I have the following code in a loop (mC is different per loop):

ldiv!(bunchkaufman!(mC), vX);

What should I do to make this non allocating using FastLapackInterface.jl?

I haven’t used the package, but from glancing at the manual, it seems that you first pre-allocate a workspace with ws = BunchKaufmanWs(A), initializing it with a matrix of the right size (assuming all your matrices are the same size), and then you do ldiv!(factorize!(wc, mC), vX).

There is also a lower-level interface documented here (with an example).

1 Like

Unfortunately, the factorize!() method is missing for BunchKaufman. We will add it soon.

Here is an example showing what is working and what isn’t yet:

using FastLapackInterface
using LinearAlgebra

# NOT WORKING
function loop_1!(vXs, mCs, ws)
    for mC in mCs    
        # factorization
        F = factorize!(ws, mC)    
        # solving linear systems
        for vX in vXs
            ldiv!(F, vX)
        end 
    end
end

function loop_2!(vXs, mCs, ws)
    for mC in mCs    
        # factorization    
        A, ipiv, info = LAPACK.sytrf!(ws, 'U', mC)        
        F = BunchKaufman(mC, ipiv, 'U', true, false, BLAS.BlasInt(0))
        # solving linear systems
        for vX in vXs
            ldiv!(F, vX)
        end 
    end
end

mCs = []
vXs = []
n = 10
for i = 1:5
    x = randn(n, n)
    mC = (x + x')/2 
    push!(mCs, mC)
    push!(vXs, randn(n))
end

# create workspace
ws = BunchKaufmanWs(mCs[1])

#= NOT WORKING !
mCs_1 = copy(mCs)
vXs_1 = copy(vXs)
loop_1!(vXs_1, mCs_1, ws)
mCs_1 = copy(mCs)
vXs_1 = copy(vXs)
@time loop_1!(vXs_1, mCs_1, ws)
=#

mCs_2 = copy(mCs)
vXs_2 = copy(vXs)
loop_2!(vXs_2, mCs_2, ws)
mCs_2 = copy(mCs)
vXs_2 = copy(vXs)
@time loop_2!(vXs_2, mCs_2, ws)
1 Like

I opened an issue here: Add factorize!() method for BunchKaufman · Issue #38 · DynareJulia/FastLapackInterface.jl · GitHub

1 Like

Let me understand this:

ws = BunchKaufmanWs(mC); #<! Creates a workspace
A, ipiv, info = LAPACK.sytrf!(ws, 'U', mC); #<! Applies the decomposition
F = BunchKaufman(mC, ipiv, 'U', true, false, BLAS.BlasInt(0)); #<! Creates the data structure Julia knows

I was aware of the 2 first steps:

  1. Allocating the workspace.
  2. Applying the decomposition.

I was missing the 3rd step, where you build the data structure as defined in LinearAlgebra.

Am I right about the analysis?

Yes, you are

1 Like