Pure-Julia Sparse Cholesky

samuelsonric · August 1, 2025, 3:08pm

I implemented the Cholesky decomposition of sparse positive-definite matrices in CliqueTrees.jl. The implementation is pure-Julia, except for some calls to BLAS and LAPACK kernels. Even the column-ordering algorithm is written in Julia!

julia> using BenchmarkTools, LinearAlgebra, MatrixMarket, SuiteSparseMatrixCollection

julia> import CliqueTrees, LDLFactorizations, QDLDL

julia> ssmc = ssmc_db(); name = "bcsstk36"; matrix = mmread(joinpath(fetch_ssmc(ssmc[ssmc.name .== name, :]; format="MM")[1], "$(name).mtx"));

julia> @btime cholesky(matrix);
  57.618 ms (55 allocations: 120.63 MiB)

julia> @btime CliqueTrees.cholesky(matrix);
  67.753 ms (94 allocations: 146.47 MiB)

julia> @btime QDLDL.qdldl(matrix);
  209.905 ms (89 allocations: 96.04 MiB)

julia> @btime LDLFactorizations.ldlt(matrix);
  274.736 ms (56 allocations: 188.30 MiB)

Here is the performance without BLAS or LAPACK.

julia> @btime CliqueTrees.cholesky(matrix);
  97.974 ms (94 allocations: 137.08 MiB)

CliqueTrees.jl is faster than LDLFactorizations.jl and QDLDL.jl because it is performing a supernodal decomposition, whereas they are performing nodal decompositions. A supernodal decomposition partitions the input matrix into dense sub-matrices, calling BLAS Level 3 routines on the blocks.

There seems to be some interest in this sort of thing from the community. For example, this post by @PetrKryslUCSD. I am less familiar with LDLT decompositions and quasi-definite matrices, but I could try implementing that next.

The numerical part of the algorithm is contained in this file.

yolhan_mannes · August 1, 2025, 3:25pm

I think most of the interest is in GitHub - NextLinearAlgebra/NextLA.jl (espacially if you can adapt it to gpu too) and in LinearSolve.jl for those thing. Quick view on your file nothing should stop you from writing most of it in kernels ( KernelAbstractions.jl) unless they are recursion of course

samuelsonric · August 1, 2025, 3:35pm

Hello @yolhan_mannes!

Thank you for the advice. The algorithm is entirely non-recursive. I have never used KernelAbstractions.jl before, but I will look into it!

yolhan_mannes · August 1, 2025, 3:36pm

Its ok you can also just do a PR on NextLA.jl (try adpating to their style before) and add tests and people may help after for the gpu adpatation.

Not sure they want anything to do with sparse though, LinearSolve.jl might be better than

Oscar_Smith · August 1, 2025, 6:29pm

nice! does this work for non native types? being able to do factorizations for fancy types is really nice

samuelsonric · August 1, 2025, 6:38pm

Yes! I have a pure-Julia fallback for each BLAS and LAPACK kernel. See here.

yolhan_mannes · August 1, 2025, 6:39pm

I think most of them are implemented in NextLA.jl already so you won’t have to add them to the PR

RoyiAvital · August 1, 2025, 7:13pm

This is really nice.
I wonder if it can be pushed over the default cholesky().

Do you have any plans for the Incomplete Cholesky Factorization?
There are some useful variants: Threshold based, Pattern based and Number of Non Zero Elements based. I guess the pattern one makes the most sense Graph wise.

The LDLT variant (For the Incomplete as well) will be greatly appreciated.

@yolhan_mannes , The NextLA.jl package seems to be for Dense Linear Algebra. This algorithm is for the sparse case.

samuelsonric · August 1, 2025, 7:30pm

Hi @RoyiAvital!

Do you have any plans for the Incomplete Cholesky Factorization?

This is also something that I do not know too much about, but if there is interest then I can try! It looks like incomplete factorizations are used as pre-conditioners?

Ah, I found this: GitHub - RoyiAvital/IncompleteCholeskyDecomposition: Implementation of the Incomplete Cholesky Decomposition with Thresholding

PetrKryslUCSD · August 1, 2025, 7:43pm

Looks really intriguing. I wonder if you have compared performance-wise with PetrKryslUCSD/Sparspak.jl: Direct solution of large sparse systems of linear algebraic equations in pure Julia PetrKryslUCSD/Sparspak.jl: Direct solution of large sparse systems of linear algebraic equations in pure Julia?

LU is working, LDLT is implemented, but not tested (yet).

RoyiAvital · August 1, 2025, 7:43pm

Indeed.
When solving very large scale sparse systems doing the full Cholesky decomposition (Or other decompositions) might generate infeasible matrix (Too many elements).

Hence the need for Incomplete Cholesky Factorization, Incomplete LDL Factorization and Incomplete LU Factorization.

Specifically the forms which can limit the number of elements: Pattern based and Number of Non Zero Elements based.

My code is just a redo of a Python code I found and it is basically the Threshold based method which might be nice, yet can not guarantee the number of elements of the output.

samuelsonric · August 1, 2025, 8:02pm

Hello @PetrKryslUCSD! Am I doing this right?

import Sparspak

n = size(matrix, 2)
problem = Sparspak.Problem.Problem(n, n)
@time Sparspak.Problem.insparse!(problem, matrix)
@time Sparspak.Problem.infullrhs!(problem, 1:n)
@time solver = Sparspak.SpkSparseSpdBase._SparseSpdBase(problem)
@time Sparspak.SpkSparseSpdBase._factor!(solver)

The output is

  0.197469 seconds (45 allocations: 137.235 MiB, 46.58% gc time)
  0.000082 seconds (1 allocation: 32 bytes)
  0.004005 seconds (34 allocations: 9.860 MiB)
UndefVarError: `_ldltfactor!` not defined in `Sparspak.SpkSparseSpdBase`
Suggestion: check for spelling errors or missing imports.

If this is just a bug then I can dev the repository and fix it.

PetrKryslUCSD · August 1, 2025, 8:07pm

Sorry, looks like a bug. When I said “not tested” I wasn’t kidding…

solver = Sparspak.SpkSparseSpdBase._SparseSpdBase(problem)

This is probably not the right way. I will have a look.

samuelsonric · August 1, 2025, 8:17pm

Probably this is right.

@time solver = Sparspak.SpkSparseSpdSolver.SparseSpdSolver(problem)
@time Sparspak.SpkSparseSpdSolver.solve!(solver)

But it also errors.

samuelsonric · August 1, 2025, 9:10pm

I got the factorization part to run without erroring. See here.

import Sparspak

function spk_ldl(matrix)
    n = size(matrix, 2)
    problem = Sparspak.Problem.Problem(n, n)
    Sparspak.Problem.insparse!(problem, matrix)
    Sparspak.Problem.infullrhs!(problem, 1:n)
    solver = Sparspak.SpkSparseSpdSolver.SparseSpdSolver(problem)
    Sparspak.SpkSparseSpdSolver.solve!(solver)
    return solver
end

@btime spk_ldl(matrix);

The output is

  2.000 s (76606554 allocations: 1.58 GiB)

PetrKryslUCSD · August 1, 2025, 9:22pm

Very cool.

samuelsonric · August 1, 2025, 10:14pm

Just released patch 1.9.1. We now use less memory than SuiteSparse.

julia> @btime cholesky(matrix)
  56.641 ms (55 allocations: 120.63 MiB)

julia> @btime CliqueTrees.cholesky(matrix)
  66.369 ms (95 allocations: 109.40 MiB)

PetrKryslUCSD · August 1, 2025, 11:12pm

At the moment I don’t understand what is going on. The LDLT factorization routine has
at this point serious bugs, and does not compile. So I am at a loss to explain why your cold could run.

samuelsonric · August 1, 2025, 11:13pm

I fixed the bugs, I think. See the most recent commit: GitHub - samuelsonric/Sparspak.jl: Direct solution of large sparse systems of linear algebraic equations in pure Julia

PetrKryslUCSD · August 2, 2025, 6:22pm

Alas, not quite. There is a bug in the factorization (division by zero).

Topic		Replies	Views
From toy to production code with sparse matrix and sparse-direct factorizations Performance sparse , cholmod	17	1231	November 1, 2022
Slow Cholesky factorization for sparse matrices? Performance linearalgebra , sparse	6	927	September 19, 2020
Incomplete LU factorisation New to Julia	6	2138	July 24, 2017
Conjugate gradient with incomplete Cholesky preconditioner Numerics	22	4452	September 6, 2021
Sparse Cholesky of Gram Matrix (SuiteSparse?) Numerics question , linearalgebra	25	2715	April 11, 2020

Pure-Julia Sparse Cholesky

Related topics