From toy to production code with sparse matrix and sparse-direct factorizations

Dear all,
I would like to port a toy code that assembles one sparse matrix and solves one linear system of equation to a production code that assembles many sparse matrices and solves many systems of equation iteratively. Before proceeding, I would like to ask what are the good (and bad) practices related to sparse matrix assemblies (1) and direct solves (2).

(1) If a the sparse matrix structure does not change through time. Does it make sense to assemble a sparse matrix (using COO) at each iteration? Or is there a way to assemble it once for good, thus allocating memory only once at the start, and then only update the coefficients at each iteration?

(2) As the matrix is positive definite, I would like to use a solution procedure based on Cholesky factorisation. Also, as the sparsity pattern does not change, it is handy to run a symbolic factorisation once for good (once at the start) and then only proceed to numerical factorization in each subsequent iterations using the initially computed symbolic factorisation. Would anyone know how to do this using Julia’s Cholmod API? Are there more efficient options?

Thanks for the hint!
Cheers

You may have a look at ExtendableSparse.jl. (Disclaimer: I am the author…). It has assembly via a linked list structure, handling of pattern updates and Cholesky, and an updateindex method which allows for faster writing into a given pattern. You can completely avoid the COO thing.

2 Likes

thanks for the answer. It indeed looks like a very interesting package. One thing I don’t fully understand is whether solvers like Cholesky (cholmod interface) can be be used or if only other solvers can be called (did not find any mention of Cholesky in the docs)?
If an other Cholesky implementation should be used, would this one allow to separate symbolic and numeric factorisation steps and reuse of Cholesky factors?

Try GitHub - PetrKryslUCSD/Sparspak.jl: Direct solution of large sparse systems of linear algebraic equations in pure Julia
The original Fortran code was set up to do precisely what you want to do.
I haven’t used the package in this way yet, but if you decide to bite the bullet,
I can help you to make it work.

You can do this with the cholesky! function from the SparseArrays stdlib.

You can also do this with other sparse-direct solvers, e.g. PARDISO provides Cholesky solvers and lets you re-use the analysis phase IIRC.

Yes, you can use the nonzeros(A) function to get a view of the nonzero coefficients of the array (in column-major order) which you can mutate-in-place.

In fact ExtendableSparse has a Sparspak backend :slight_smile: .

As I mentioned the linked list based API for matrix build-up in ExtendableSparse: Sparspak.jl proper has a similar interface, the implementation seems to be close to the historical origin of the linked list idea. So it also allows to avoid the COO intermediate.

As for the general picture: there are a number of further considerations to be taken into account:

Other sparse direct solvers already have been mentioned, in particular, Pardiso is multithreaded.

Depending on the origin of the problem it may be also worth to look at preconditioned conjuagated gradients. If the matrix comes from a discretization of a PDE in 3D, it has a better complexity scaling than the the Cholesky factorization.

Also one may mention the efforts around LinearSolve.jl.

1 Like

Thank you all for these very insightful answers. So, I would like to summarise all the information, please correct me if I’m wrong…

  1. Sparse matrix assembly and update: SparseArrays and ExtendableSparse allow for both matrix pre-allocation, assembly and coefficients update. Both reach a comparable performance (c.f. docs of ExtendableSparse). The benefit of ExtendableSparse.jl over SparseArrays is that it does not require the construction/storage of COO arrays (`triplets’) .

  2. Sparse direct factorization of a symmetric positive definite matrix: If one wants a detailed control of the procedure (e.g. distinct steps of symbolic and numeric factorisations), one may better try using Sparspak and Pardiso rather than the current Cholesky implementation. If this is right, then I have a couple of additional questions:
    EDIT: cholesky! already handles this functionality.

  • @PetrKryslUCSD and @j-fu: Does Sparspak allow for Cholesky or just LU? If it performs Cholesky, how does it perform in comparison to CHOLMOD?
  • @stevengj: Will the Cholesky implementation allow for controlling symbolic and numeric steps independently in the future? Or maybe such control is already possible?

Cholesky not converted from Fortran yet. Should not be a big deal (one - two days). LU is on par with SuiteSparse lu or a tad slower.

1 Like

As I wrote above, re-using the symbolic factorization is supported with the stdlib Cholesky implementation via the cholesky! function.

1 Like

As for 2: with using SuiteSparse, you can apply cholesky to a sparse matrix, and this calls back to CHOLMOD. cholesky! allows to reuse the existing pattern.
I think this is a good implementation of cholesky factorization. Not multithreaded AFAIK (as opposed to Pardiso). Here is the corresponding code I am using.

I think the best way to decide is to try out the different variants and do some benchmarks fitting to your particular situation.

1 Like

Sorry, I got confused. That is pretty clear now, thanks!

Thanks, yes that’s right. I will definitely enjoy the luxury of having 3 different Cholesky factorizations implementations available (CHOLMOD, Pardiso and upcoming Sparpak). I will tests for my specific case. (preconditioning of KSP solvers for 2D FDM/FEM analysis as discussed here)

Would it be possible to port (part of) your package to modular arithmetic with machine-sized prime numbers, for number-theory purposes? (See e.g. Linbox C++ library.) There is not yet a Julia library in this niche area. Since your code can handle generic numeric types, I imagine the only change needed for modular arithmetic is that pivots should not be chosen based on magnitude of a number, which is not defined in finite fields.

P.S. of course, BLAS routines, if used, would only work for floating point types.

Well, with Allow for generic floating point types including ForwardDiff.Dual by j-fu · Pull Request #8 · PetrKryslUCSD/Sparspak.jl · GitHub the code works with general floating point numbers (ForwardDiff.Dual, MultiFloats etc) which means that in this case custom blas like routines are called (essentially stripped from netlib).

I believe we could do something about pivoting.

… nerd sniped. Tried AbstractAlgebra.jl, but this is hard to make working, as the order of the field is not encoded in the type and e.g. one(T) cannot be implemented.

With GaloisFields.jl it seems to be closer, however @PetrKryslUCSD already seems to have identified the main problem: pivoting. With the RowNonZero() strategy adapted to sparspak I was able to get a solution with a couple of small changes to the code.

Preparing a PR…

EDIT: Prepare for finite fields by j-fu · Pull Request #12 · PetrKryslUCSD/Sparspak.jl · GitHub

@greatpet: it would be good to have some test data for verified correct solutions from other projects.

For curiosity: where do sparse matrices over finite fields occur ?

EDIT: Guess we should discuss over at the PR as this becomes off-topic here…

Some uses of dense & sparse finite-field linear algebra include cryptography and integer factorization. See e.g. the Introduction sections of the papers
(1) Dense Linear Algebra over Word-Size Prime
Fields: the FFLAS and FFPACK Packages

(2) Solving Linear Equations Over GF(2): Block Lanczos Algorithm

Other libraries have tested computing the rank of an integer matrix modulo a prime number. See the Sparse Integer Matrices Collection by J.G. Dumas. Unfortunately, this may not be a good test here since it’s necessary to detect exact cancellations of rows in Gaussian elimination to compute a reduced rank. Floating-point sparse linear algebra code usually deals with non-singular square matrices, so it’s most straightforward to port Petr’s package to cover square finite-field matrices, which is still useful… Sorry for hijacking the thread further. I’ll comment on Github if I have more to say.

As a follow up, I’ve made a basic script that assembles and factorize the discrete operator for a 1D variable coefficient Poisson problem. In this first iteration, I’ve used SparseArrays’ and LinearAlgebra’s functions nonzeros and cholesky! as suggested by @stevengj.

  1. For assembly, I manage to break it into 2 steps: (a) one initial assembly which would allocate all necessary data and (b) an update of the matrix elements by using the function nonzeros. This is pretty cool,the update successfully results in 0 allocation. The only thing is that it is difficult to carry out the update since it has to be done using the CSC format since the non-zeros structure is extracted from a matrix already assembled in CSC format. Here, I use the fact that the matrix is symmetric, and I’m not sure how I could assemble directly CSC in a more complicated cases (I’m use to assemble row by row, equation by equation).
  2. For factorization, I also get 2 steps. The initial factorisation is carries out by cholesky. The update of the factors is made with cholesky!, however I do not understand why this step allocates memory. Would anyone have an idea? Am I using it in a wrong way?

In the next iteration, I will try the combination of ExtendableSparse and Sparspak, which may ease the matrix element update step(?).

using Plots, Printf, SparseArrays, LinearAlgebra

function NonZeroCOO!(I,J,V, i,j,v, inz)
    I[inz] = i
    J[inz] = j
    V[inz] = v
    return inz += 1 
end

function CreateSparseK(nnod, Num, bct)
    # Construct initial sparse matrix from triplets
    I   = ones(Int,3*nnod)
    J   = ones(Int,3*nnod)
    V   = zeros(3*nnod)
    inz = 1
    for inode in 1:nnod # Loop on equations
        iC = Num[inode] 
        if bct[inode] == 1 # BC Dirichlet
            inz = NonZeroCOO!(I,J,V, iC,iC,inz, inz)
        else
            iS = Num[inode-1] 
            iN = Num[inode+1] 
            if bct[inode-1] == 0 # Symmetrise by removing unnecessary Dirichlet connection
                inz = NonZeroCOO!(I,J,V, iS,iC,inz, inz) # transpose insertion for CSC, brrr...
            end
            inz = UpdateCOO!(I,J,V, iC,iC,inz, inz)
            if bct[inode+1] == 0 # Symmetrise by removing unnecessary Dirichlet connection
                inz = NonZeroCOO!(I,J,V, iN,iC,inz, inz) # transpose insertion for CSC, brrr...
            end
        end
    end    
    # Final assembly
    K = sparse(I, J, V)
    return K, nonzeros(K)   
end

function UpdateSparseK!(K_nz, dy, η, nnod, bct)
    # Update non-zeros, K_nz, in CSC format
    inz = 1
    for inode in 1:nnod  # Loop on equations
        if bct[inode] == 1
            K_nz[inz] = 1.0; inz+=1
        else
            eS = η[inode-1]
            eN = η[inode]
            cC = (1.0 * eN ./ dy + 1.0 * eS ./ dy) ./ dy
            cS = -1.0 * eS ./ dy .^ 2
            cN = -1.0 * eN ./ dy .^ 2
            if bct[inode-1] == 0 # Symmetrise by removing unnecessary Dirichlet connection
                K_nz[inz] = cS; inz+=1
            end
            K_nz[inz] = cC; inz+=1
            if bct[inode+1] == 0 # Symmetrise by removing unnecessary Dirichlet connection
                K_nz[inz] = cN; inz+=1
            end
        end
    end
    return nothing    
end

function Main_1D_Poisson()
    L        = 1.0              # domain size
    ncy      = 1000             # number of cells, ncy+1=number of nodes
    Δy       = L/ncy            # spacing
    Num      = 1:1:(ncy+1)      # dof numbering
    η        = ones(ncy)        # coefficient
    bct      = zeros(Int,ncy+1) # array that identifies dof type (here Dirichlet:1, else: 0)
    bct[1]   = 1                # identify BC nodes as Dirichlets
    bct[end] = 1                # identify BC nodes as Dirichlets 
    println("Create initial sparse matrix") 
    @time Kuu, K_nz = CreateSparseK(ncy+1, Num, bct)
    println("Update non-zeros. Achtung, it is CSC: so not trivial!")
    @time UpdateSparseK!(K_nz, Δy, η, ncy+1, bct)
    println("Initial Cholesky factorisation")
    @time Kc  = cholesky(Kuu; check = false)
    println("Update Cholesky factors reusing old ones: why does this allocate?")
    @time cholesky!(Kc,Kuu; check = false)
end 

Main_1D_Poisson()