The DynareJulia team and I are excited to announce the first major release of FastLapackInterface.jl. The package facilitates the pre-allocation of workspaces to be used with some of the most important LAPACK functions with respect to factorizations. This eliminates all major sources of allocations inside the LAPACK functions. The package for now targets QR, Schur and LU factorizations, but functionality may be added in the future.
We have designed the package to be as consistent with the LAPACK functionality in Base julia, making it highly transparent. The QRWs workspace for example can be used with LinearAlgebra.QR as follows:
Thanks a lot for this package! It would be very useful in my applications where I need LAPACK operations for many small matrices.
I have two questions:
Do you have a plan to integrate more functions? Specifically, I use Hermitian diagonalization (syev / heev) a lot.
If I understood the code correctly, currently the non-allocating function throws if the workspace is too small. Would it be possible to have an option to just resize the workspace and run?
Yes! In my own work I also mostly run diagonalizations, and I have implemented in the past something similar to a Workspace for it. We are totally open to adding more functionality, and this would be the first one we’d start with.
You are correct. It’s not a bad idea to indeed support resizing. If you want, could you open an issue on it to further discuss what would be the best way to do this? I’m not sure if doing that “under the hood” is the best idea since maybe that would lead to some unexpected behaviors for users, i.e. too much magic. Especially considering that for some of the factorizations work buffers are not trivially determinable but rather require a LAPACK call themselves (maybe looking at what LAPACK does to figure out the size and doing it manually in julia is possible though). I think just having a resize! function that would take in a new size or matrix would be a good idea.
LinearAlgebra functions often involve two LAPACK calls, one to determine the workspace size and the second to actually perform the calculation after resizing the workspace. Having the resizing step built in might not be too surprising.
On another note, I guess pre-allocating a workspace will not be thread-safe?
I see. My (erroneous) impression was that the package allows one to pre-allocate output matrices, and isn’t fastidious about allocations in resizing the workspace. Thanks for clearing this up!
Yes, I was referring to using the same workspace from multiple threads, which seems unsafe. I guess the solution is to allocate a separate workspace for each thread, and have each task fetch a workspace from a pool.
We have another update to announce, FastLapackInterface version 1.2.0.
Cholesky and BunchKaufman related Workspaces were added, and now each Workspace can be resized using resize!. Automatic resizing functionality while calling the LAPACK functions or factorize!/decompose! is turned on by default, but can be controlled with the resize keyword argument.
i.e.
A = rand(3,3)
ws = Workspace(LAPACK.getrf!, A; kwargs...)
B = rand(4,4)
factorize!(ws, B)
will automatically resize ws appropriately. If instead factorize!(ws, B, resize=false) were used, an error would be thrown.
I haven’t used the package, but from glancing at the manual, it seems that you first pre-allocate a workspace with ws = BunchKaufmanWs(A), initializing it with a matrix of the right size (assuming all your matrices are the same size), and then you do ldiv!(factorize!(wc, mC), vX).
Unfortunately, the factorize!() method is missing for BunchKaufman. We will add it soon.
Here is an example showing what is working and what isn’t yet:
using FastLapackInterface
using LinearAlgebra
# NOT WORKING
function loop_1!(vXs, mCs, ws)
for mC in mCs
# factorization
F = factorize!(ws, mC)
# solving linear systems
for vX in vXs
ldiv!(F, vX)
end
end
end
function loop_2!(vXs, mCs, ws)
for mC in mCs
# factorization
A, ipiv, info = LAPACK.sytrf!(ws, 'U', mC)
F = BunchKaufman(mC, ipiv, 'U', true, false, BLAS.BlasInt(0))
# solving linear systems
for vX in vXs
ldiv!(F, vX)
end
end
end
mCs = []
vXs = []
n = 10
for i = 1:5
x = randn(n, n)
mC = (x + x')/2
push!(mCs, mC)
push!(vXs, randn(n))
end
# create workspace
ws = BunchKaufmanWs(mCs[1])
#= NOT WORKING !
mCs_1 = copy(mCs)
vXs_1 = copy(vXs)
loop_1!(vXs_1, mCs_1, ws)
mCs_1 = copy(mCs)
vXs_1 = copy(vXs)
@time loop_1!(vXs_1, mCs_1, ws)
=#
mCs_2 = copy(mCs)
vXs_2 = copy(vXs)
loop_2!(vXs_2, mCs_2, ws)
mCs_2 = copy(mCs)
vXs_2 = copy(vXs)
@time loop_2!(vXs_2, mCs_2, ws)
ws = BunchKaufmanWs(mC); #<! Creates a workspace
A, ipiv, info = LAPACK.sytrf!(ws, 'U', mC); #<! Applies the decomposition
F = BunchKaufman(mC, ipiv, 'U', true, false, BLAS.BlasInt(0)); #<! Creates the data structure Julia knows
I was aware of the 2 first steps:
Allocating the workspace.
Applying the decomposition.
I was missing the 3rd step, where you build the data structure as defined in LinearAlgebra.