Pure-Julia Sparse Cholesky

I implemented the Cholesky decomposition of sparse positive-definite matrices in CliqueTrees.jl. The implementation is pure-Julia, except for some calls to BLAS and LAPACK kernels. Even the column-ordering algorithm is written in Julia!

julia> using BenchmarkTools, LinearAlgebra, MatrixMarket, SuiteSparseMatrixCollection

julia> import CliqueTrees, LDLFactorizations, QDLDL

julia> ssmc = ssmc_db(); name = "bcsstk36"; matrix = mmread(joinpath(fetch_ssmc(ssmc[ssmc.name .== name, :]; format="MM")[1], "$(name).mtx"));

julia> @btime cholesky(matrix);
  57.618 ms (55 allocations: 120.63 MiB)

julia> @btime CliqueTrees.cholesky(matrix);
  67.753 ms (94 allocations: 146.47 MiB)

julia> @btime QDLDL.qdldl(matrix);
  209.905 ms (89 allocations: 96.04 MiB)

julia> @btime LDLFactorizations.ldlt(matrix);
  274.736 ms (56 allocations: 188.30 MiB)

Here is the performance without BLAS or LAPACK.

julia> @btime CliqueTrees.cholesky(matrix);
  97.974 ms (94 allocations: 137.08 MiB)

CliqueTrees.jl is faster than LDLFactorizations.jl and QDLDL.jl because it is performing a supernodal decomposition, whereas they are performing nodal decompositions. A supernodal decomposition partitions the input matrix into dense sub-matrices, calling BLAS Level 3 routines on the blocks.

There seems to be some interest in this sort of thing from the community. For example, this post by @PetrKryslUCSD. I am less familiar with LDLT decompositions and quasi-definite matrices, but I could try implementing that next.

The numerical part of the algorithm is contained in this file.

18 Likes

I think most of the interest is in GitHub - NextLinearAlgebra/NextLA.jl (espacially if you can adapt it to gpu too) and in LinearSolve.jl for those thing. Quick view on your file nothing should stop you from writing most of it in kernels ( KernelAbstractions.jl) unless they are recursion of course

2 Likes

Hello @yolhan_mannes!

Thank you for the advice. The algorithm is entirely non-recursive. I have never used KernelAbstractions.jl before, but I will look into it!

Its ok you can also just do a PR on NextLA.jl (try adpating to their style before) and add tests and people may help after for the gpu adpatation.

Not sure they want anything to do with sparse though, LinearSolve.jl might be better than

1 Like

nice! does this work for non native types? being able to do factorizations for fancy types is really nice

2 Likes

Yes! I have a pure-Julia fallback for each BLAS and LAPACK kernel. See here.

3 Likes

I think most of them are implemented in NextLA.jl already so you won’t have to add them to the PR

1 Like

This is really nice.
I wonder if it can be pushed over the default cholesky().

Do you have any plans for the Incomplete Cholesky Factorization?
There are some useful variants: Threshold based, Pattern based and Number of Non Zero Elements based. I guess the pattern one makes the most sense Graph wise.

The LDLT variant (For the Incomplete as well) will be greatly appreciated.

@yolhan_mannes , The NextLA.jl package seems to be for Dense Linear Algebra. This algorithm is for the sparse case.

2 Likes

Hi @RoyiAvital!

Do you have any plans for the Incomplete Cholesky Factorization?

This is also something that I do not know too much about, but if there is interest then I can try! It looks like incomplete factorizations are used as pre-conditioners?

Ah, I found this: GitHub - RoyiAvital/IncompleteCholeskyDecomposition: Implementation of the Incomplete Cholesky Decomposition with Thresholding

1 Like

Looks really intriguing. I wonder if you have compared performance-wise with PetrKryslUCSD/Sparspak.jl: Direct solution of large sparse systems of linear algebraic equations in pure Julia PetrKryslUCSD/Sparspak.jl: Direct solution of large sparse systems of linear algebraic equations in pure Julia?

LU is working, LDLT is implemented, but not tested (yet).

1 Like

Indeed.
When solving very large scale sparse systems doing the full Cholesky decomposition (Or other decompositions) might generate infeasible matrix (Too many elements).

Hence the need for Incomplete Cholesky Factorization, Incomplete LDL Factorization and Incomplete LU Factorization.

Specifically the forms which can limit the number of elements: Pattern based and Number of Non Zero Elements based.

My code is just a redo of a Python code I found and it is basically the Threshold based method which might be nice, yet can not guarantee the number of elements of the output.

2 Likes

Hello @PetrKryslUCSD! Am I doing this right?

import Sparspak

n = size(matrix, 2)
problem = Sparspak.Problem.Problem(n, n)
@time Sparspak.Problem.insparse!(problem, matrix)
@time Sparspak.Problem.infullrhs!(problem, 1:n)
@time solver = Sparspak.SpkSparseSpdBase._SparseSpdBase(problem)
@time Sparspak.SpkSparseSpdBase._factor!(solver)

The output is

  0.197469 seconds (45 allocations: 137.235 MiB, 46.58% gc time)
  0.000082 seconds (1 allocation: 32 bytes)
  0.004005 seconds (34 allocations: 9.860 MiB)
UndefVarError: `_ldltfactor!` not defined in `Sparspak.SpkSparseSpdBase`
Suggestion: check for spelling errors or missing imports.

If this is just a bug then I can dev the repository and fix it.

Sorry, looks like a bug. When I said “not tested” I wasn’t kidding…

solver = Sparspak.SpkSparseSpdBase._SparseSpdBase(problem)

This is probably not the right way. I will have a look.

1 Like

Probably this is right.

@time solver = Sparspak.SpkSparseSpdSolver.SparseSpdSolver(problem)
@time Sparspak.SpkSparseSpdSolver.solve!(solver)

But it also errors.

I got the factorization part to run without erroring. See here.

import Sparspak

function spk_ldl(matrix)
    n = size(matrix, 2)
    problem = Sparspak.Problem.Problem(n, n)
    Sparspak.Problem.insparse!(problem, matrix)
    Sparspak.Problem.infullrhs!(problem, 1:n)
    solver = Sparspak.SpkSparseSpdSolver.SparseSpdSolver(problem)
    Sparspak.SpkSparseSpdSolver.solve!(solver)
    return solver
end

@btime spk_ldl(matrix);

The output is

  2.000 s (76606554 allocations: 1.58 GiB)

Very cool.

1 Like

Just released patch 1.9.1. We now use less memory than SuiteSparse.

julia> @btime cholesky(matrix)
  56.641 ms (55 allocations: 120.63 MiB)

julia> @btime CliqueTrees.cholesky(matrix)
  66.369 ms (95 allocations: 109.40 MiB)
3 Likes

At the moment I don’t understand what is going on. The LDLT factorization routine has
at this point serious bugs, and does not compile. So I am at a loss to explain why your cold could run.

I fixed the bugs, I think. See the most recent commit: GitHub - samuelsonric/Sparspak.jl: Direct solution of large sparse systems of linear algebraic equations in pure Julia

7 Likes

Alas, not quite. There is a bug in the factorization (division by zero).