The best linear solver for sparse bandedblockbandedmatricies

RangeFu · May 24, 2020, 3:31am

Hi Community,

I’m working on a program that requires solving A\B repeatedly, where A is a sparse matrix that looks like this (it’s from finite-difference of an 2d PDE).

I was wondering is there any method that can utilize its “banded-block-banded” structure as well as its sparsity to achieve higher speed? I tried to solve it as a bandedblockbandedmatricies but it’s ten times slower than lu from umfpack. I guess it’s because it’s too sparse.

I’m not super satisfied with the current method because I’m converting this program from Fortran, which also uses umfpack but runs 5 times faster than my Julia version so I was wondering any chance there is some black magic that can help.

Oscar_Smith · May 24, 2020, 3:59am

Have you considered just using DifferentialEquations directly? It should be able to do this automatically.

ettersi · May 24, 2020, 4:57am

If you store A as a SparseMatrixCSC (have a look at the SparseArrays package), then \ will use umfpack internally and hence almost surely be as fast as Fortran

ChrisRackauckas · May 24, 2020, 4:58am

It has a specialized QR and I think LU for these matrix types. And if you’re solving a time-dependent PDE, you can tell DifferentialEquations.jl that your matrix is a BlockBandedMatrix type (or any other matrix type that exists in the ecosystem) to specialize the internal linear solver.

RangeFu · May 24, 2020, 5:05am

yeah that’s the problem. I did it in this way so it should be the same, but somehow my program is slower. Though the slowdown is also possibly caused by poor scaling of mutltithreading of my code. Unfortunately I don’t know too much about Fortran so cannot compare it unit by unit.

ettersi · May 24, 2020, 5:08am

Try comparing just the time of solving the linear system in Julia to the overall runtime of your Fortran code. Julia should be at least as good as Fortran in this test. If so, the problem is with how you assemble the sparse matrix.

Also, there is no way to exploit the banded-block-banded structure of the matrix when solving a linear system other than to use a generic sparse linear system solver. LU factorisation of your banded-block-banded matrix leads to fill-in, which destroys the special structure.

RangeFu · May 24, 2020, 5:11am

Thanks for the suggestions. I tested BandedBlockBandedMatricies, and at this sparsity level it seems it’s slower than sparse LU. Probably I didn’t use it in the right way:

using BlockBandedMatrices, BandedMatrices, SparseArrays, BenchmarkTools, LinearAlgebra
n = 40
A = BandedBlockBandedMatrix(Zeros(n^2,n^2), Fill(n,n),Fill(n,n), (1,1), (1,1))
for K = 1:n
       view(A,Block(K,K))[band(0)] .= -4; view(A,Block(K,K))[band(1)] .= 1; view(A,Block(K,K))[band(-1)] .= 1;
end

for K = 1:n-1
       view(A,Block(K,K+1))[band(0)] .= 1; view(A,Block(K+1,K))[band(0)] .= 1;
end
RHS = ones(n*n)/n/n

@btime $A \ $RHS
Asparse = sparse(A)
@btime $Asparse \ $RHS

Output:

14.092 ms (32918 allocations: 8.76 MiB)                                                                                            │·····················································
2.877 ms (59 allocations: 2.25 MiB)

RangeFu · May 24, 2020, 5:14am

I tried to use it once, but due to the special structure of our problem (the PDE problem is an optimal control, and the whole problem is a DAE-BVP), it’s still faster to go with our hand-written method, unfortunately.

ChrisRackauckas · May 24, 2020, 5:15am

Did you make sure that your number of BLAS threads matches your CPU cores?

Oscar_Smith · May 24, 2020, 5:15am

If you still have the code around, it might be worth trying again. There have been a lot of recent speed improvements.

ChrisRackauckas · May 24, 2020, 5:16am

We don’t have a DAE-BVP solver, though my guess is that this person didn’t specialize the linear solver or anything like that in a single shooting setup so they probably are just comparing slow methods to slow methods

ettersi · May 24, 2020, 5:18am

DifferentialEquations.jl can solve 2D PDEs?

ChrisRackauckas · May 24, 2020, 5:19am

Yes. The first demo of that was 2016. There’s a 2D PDE with automated sparsity handling and specialization of the linear sovlers in the second tutorial: https://diffeq.sciml.ai/latest/tutorials/advanced_ode_example/#Automatic-Sparsity-Detection-1 .

ettersi · May 24, 2020, 5:36am

As far as I can tell, this example requires the user to replace the PDE with a large system of coupled ODEs and then suggests to use DifferentialEquations.jl to solve these ODEs. I would say this demonstrates that DifferentialEquations.jl can handle 2D PDEs, but it’s not exactly the same as “solving” such PDEs. Solving would require that I can specify the PDE and a discretisation method, and then DifferentialEquations.jl would write out the scheme for me, just like it does for ODEs.

ChrisRackauckas · May 24, 2020, 5:40am

And it’s still a little early, but there’s DiffEqOperators.jl for helping with auto-discretizations, along with the other 5 libraries on this topic, like NeuralNetDiffEq.jl for other cases, etc.

FEniCS.jl gives you a discretization you give to DifferentialEquations:

github.com

SciML/FEniCS.jl/blob/master/examples/heat_equation.jl

"""
Solve the heat equation ∂ₜu = Δu on [0,1] using OrdinaryDiffEq for time stepping.
Boundary conditions are such that an analytic solution is given by:
    sin(2π*x) * exp(-(2*π)^2*t)

This example demonstrates how to solve time dependent PDE's combining `fenics` and `DiffEq`.

"""
module HeatEquationExample

using Test
using OrdinaryDiffEq
using FEniCS

mesh = UnitIntervalMesh(50)
V = FunctionSpace(mesh, "P", 1)
u_str = "sin(2*pi*x[0]) * exp(-t*pow(2*pi, 2))"

tspan = (0.0, 0.1)
u = interpolate(Expression(u_str, degree = 2, pi = Float64(pi), t = tspan[1]), V)

This file has been truncated. show original

Etc.

So that’s finite difference, mesh-free neural net methods, and FEM. Then we point to ApproxFun for pseudospectral.

ChrisRackauckas · May 24, 2020, 5:47am

Also BTW, these pieces are being put together to build an auto-discretizer method where PDEs are declared symbolically via ModelingToolkit and then are solved to the end. Here’s a demo of it:

github.com/SciML/DifferentialEquations.jl

Automated PDE solving fit for DifferentialEquations.jl

opened 07:36AM - 17 Jun 19 UTC

closed 05:53AM - 02 Sep 20 UTC

ChrisRackauckas

Building off of the story in https://github.com/JuliaDiffEq/DifferentialEquation…s.jl/issues/260 , DiffEqOperators getting auto finite difference discretizations likely done this summer. Thus the next step is automating the PDE discretization from a high level description. Since ModelingToolkit.jl is well-developed, it makes sense to utilize that to give the high level description. Thus I put a prototype together: ```julia using ModelingToolkit, DiffEqOperators, DiffEqBase, LinearAlgebra # Define some variables @parameters t x @variables u(..) @derivatives Dt'~t @derivatives Dxx''~x eq = Dt(u(t,x)) ~ Dxx(u(t,x)) bcs = [u(0,x) ~ - x * (x-1) * sin(x), u(t,0) ~ 0, u(t,1) ~ 0] abstract type AbstractDomain{T,N} end struct IntervalDomain{T} <: AbstractDomain{T,1} lower::T upper::T end struct VarDomainPairing variables domain::AbstractDomain end Base.:∈(variable::ModelingToolkit.Operation,domain::AbstractDomain) = VarDomainPairing(variable,domain) Base.:∈(variables::NTuple{N,ModelingToolkit.Operation},domain::AbstractDomain) where N = VarDomainPairing(variables,domain) domains = [t ∈ IntervalDomain(0.0,1.0), x ∈ IntervalDomain(0.0,1.0)] ######################################### # Note: Variables can be multidimensional struct ProductDomain{D,T,N} <: AbstractDomain{T,N} domains::D end ⊗(args::AbstractDomain{T}...) where T = ProductDomain{typeof(args),T,length(args)}(args) @parameters z z ∈ (IntervalDomain(0.0,1.0) ⊗ IntervalDomain(0.0,1.0)) @register Base.getindex(x,i) Base.getindex(x::Operation,i::Int64) = Operation(getindex,[x,i]) z[1]^2 + z[2]^2 ~ 1 z[1]^2 + z[2]^2 < 1 struct CircleDomain <: AbstractDomain{Float64,2} polar::Bool CircleDomain(polar=false) = new(polar) end z ∈ CircleDomain() @parameters y (x,y) ∈ CircleDomain() @parameters r θ (r,θ) ∈ CircleDomain(true) # Use constrained equations for more detailed BCs struct ConstrainedEquation constraints eq end ConstrainedEquation([x ~ 0,y < 1/2], u(t,x,y) ~ x + y^2) ######################################### struct PDESystem <: ModelingToolkit.AbstractSystem eq bcs domain indvars depvars end pdesys = PDESystem(eq,bcs,domains,[t,x],[u]) abstract type AbstractDiscretization end struct MOLFiniteDifference{T} <: AbstractDiscretization dxs::T order::Int end MOLFiniteDifference(args...;order=2) = MOLFiniteDifference(args,order) discretization = MOLFiniteDifference(0.1) struct PDEProblem{P,E,S} <: DiffEqBase.DEProblem prob::P extrapolation::E space::S end function Base.show(io::IO, A::PDEProblem) println(io,summary(A.prob)) println(io) end Base.summary(prob::PDEProblem) = string(DiffEqBase.TYPE_COLOR, nameof(typeof(prob)), DiffEqBase.NO_COLOR) function discretize(pdesys, discretization::MOLFiniteDifference) tdomain = pdesys.domain[1].domain domain = pdesys.domain[2].domain @assert domain isa IntervalDomain len = domain.upper - domain.lower dx = discretization.dxs[1] interior = domain.lower+dx:dx:domain.upper-dx X = domain.lower:dx:domain.upper L = CenteredDifference(2,2,dx,Int(len/dx)-2) Q = DiffEqOperators.DirichletBC([0.0,0.0],[1.0,1.0]) function f(du,u,p,t) mul!(du,L,Array(Q*u)) end u0 = @. - interior * (interior - 1) * sin(interior) PDEProblem(ODEProblem(f,u0,(tdomain.lower,tdomain.upper),nothing),Q,X) end prob = discretize(pdesys,discretization) function DiffEqBase.solve(prob::PDEProblem,alg::DiffEqBase.DEAlgorithm,args...; kwargs...) solve(prob.prob,alg,args...;kwargs...) end using OrdinaryDiffEq sol = solve(prob,Tsit5(),saveat=0.1) using Plots plot(prob.space,Array(prob.extrapolation*sol[1])) plot!(prob.space,Array(prob.extrapolation*sol[2])) plot!(prob.space,Array(prob.extrapolation*sol[3])) plot!(prob.space,Array(prob.extrapolation*sol[4])) ``` ### Description of the code The equation and boundary conditions are just given by ModelingToolkit.jl operations. The ConstrainedEquation allows for specifying equations only on specific parts. Then each independent variable needs a domain. Domains can be multi-dimensional, and variables can be multi-dimensional, so this lets us smartly define variables in domains (and in manifolds!) and equations using them. The totality of the PDE is in the PDESystem, just a ModelingToolkit.AbstractSystem object. The workflow for solving a PDE is thus: 1. Define a PDESystem. 2. Discretize to a PDEProblem with an AbstractDiscretization. This step will kick out things like a PDEProblem that wraps an ODEProblem, or SDEProblem, or even LinearProblem and NonlinearProblem. 3. Solve the PDEProblem. So (1) is all ModelingToolkit.jl. (2) is just a `discretize` function and some problem types. We will need to add LinearProblem and NonlinearProblem with some solve dispatches for this. After (3), we will want to wrap the solution so it can get a nicer plot recipe. Thus the steps are: - [x] Add domains and VarDomainPairing to ModelingToolkit - [x] Add ConstrainedEquation to ModelingToolkit - [x] Implement PDESystem in ModelingToolkit - [x] Implement LinearProblem and NonlinearProblem in DiffEqBase - [ ] Implement some basic DEAlgorithms for it. Solve dispatches on DEAlgorithms that just do `\`, GMRES (`LinSolveGMRES()`), `nlsolve` - [x] Implement PDEProblem, PDETimeseriesSolution, PDENoTimeseriesSolution - [x] Create `discretize` stub function in DiffEqBase - [x] Implement a smart `discretize` function in DiffEqOperators.jl for automatic finite difference discretizations. This is where a ton of work will have to go, since it will need to parse the equation and find out what `CenteredDifference` and `UpwindDifference` operators to create. This work will continue after this closes.

But there’s really no reason to wait for that interface because you can always grab the components. That said, getting the first fully compatible solvers (via physics-informed neural networks) is one of the chosen summer projects.

RangeFu · May 24, 2020, 5:49am

@ChrisRackauckas For the BLAS part, no I didn’t, but this part of code is multithreded in a for-loop anyways so I’m not sure it’ll help. Worth trying though, will do tomorrow.

For the DifferentialEquations part, actually we had this discussion months ago on slack. It’s an HJB in an Econ model, and we use some weird implicit-eplicit method to avoid solving the Jacobian. Computing jacobian itself even with sparse diff is slower than our current method.

We would be really happy to see our problem totally outsourced to DifferentialEquations. Hand-coding time step algorithms is painful after all. When I wrote the code I always maintain the interface compatible with the package so one day probably we could use it. But at least for now the general purpose package cannot utilize the special structure of our problem so we have to do it in the crude way.

ChrisRackauckas · May 24, 2020, 5:58am

using LinearAlgebra
BLAS.set_num_threads(4)

is required on many computers due to a bug in the default setting: BLAS threads should default to physical not logical core count? · Issue #33409 · JuliaLang/julia · GitHub . You can get a pretty nice speedup on many computers just by fixing that.

I thought Jesse told you that computation was sped up ~3000x 2 weeks ago: reduce time complexity of matrix2graph by ghislainb · Pull Request #107 · JuliaDiff/SparseDiffTools.jl · GitHub . I was trying to find who you were on the Slack to see if we can test your problem out again. Your problem was also in the same range where matrix coloring was taking around 700 seconds IIRC and that was slowing the computation down, but now that should be sub-second.

RangeFu · May 24, 2020, 6:22am

This looks exciting. Will try. Thanks.
Though our issue is computing jacobian even with colored matrix is slow. Optimal control really messed up a lot.

jlperla · May 24, 2020, 3:43pm

Insofar as the dense operations are not comparable with fortran, the other thing is to try GitHub - JuliaLinearAlgebra/MKL.jl: Intel MKL linear algebra backend for Julia which is very easy to use when the installation is working (which it seems to be right now). To go back to BLAS you can just reinstall the julia binary (and no need to do anything else). Sometimes MKL is way faster than BLAS, and sometimes they are similar (with the tweak that chris said)

For the linear algebra stuff, I see two approaches:

Maybe there are specialized banded-block-banded solvers or algorithms out there. I think LAPACK/MKL has DGBTRS but I don’t know if is hooked up to the blockbanded yet? But calling out to underlying fortran libraries is even enough. And I am not sure if there are algorithms that can exploit it for banded block banded. @dlfivefifty any thoughts, especially if MKL is linked in?
If the matrices are getting large enough, iterative methods start to become comparable here. These should be easy to try out
- For this, you should be able to define your matrix-vector product matrix-free. See 20. Krylov Methods and Matrix Conditioning — Quantitative Economics with Julia for some notes in progess on this sort of stuff.
- I would look to the PDE literature for preconditioners for this sort of thing. The forward-backward structure is always weird with HJBE, but once it is in matrices I think the sparsity patterns are typical of PDE discretizations. Maybe this is what multigrid preconditioners are for?
- Since you are solving this problem repeatedly, there might be all sorts of preconditioners that are worth trying. Even if calculating them is expensive, it could speed things up dramatically later. Don’t know, but worth investigating.

But with all of that said, I am not confident that you are going to beat just using a sparse-direct solver for problems of this size. Especially if you are calculating the LU decomposition and then reusing it each time you solve the problem. But by trying out these alternative methods it will probably be time well spent for when the problems get larger.

Yes, this stuff is best discussed on slack (as are most advanced topics around differential equations). Sorry I didn’t follow up with an email.

I think there are a few things here:

Can the existing diffeq implement the IMEX structure used if your fortran? Chris seemed to think so, but I am of the “believe it when I see it” when it comes to optimal control problems.
Can the existing diffeq implement better algorithms than the one you are using without significant work? This I am very skeptical of, because control problems are just weird.
Can we get better algorithms than your current one built into diffeq, or at the very least a more high-performance implementation of the IMEX for larger problems? I think this is highly likely. Chris and I discussed this and he thinks more state of the art approaches from control might be useful. But this stuff can’t happen before the fall.

Topic		Replies	Views
New blog post on banded matrices and ordinary differential equations using BandedMatrices.jl Numerics announcement	31	2741	February 9, 2019
Best way to construct 2-D Laplacian banded matrix: BandedMatrices, sparse, or BlockBandedMatrices? General Usage package	22	4351	July 2, 2021
DifferentialEquations.jl FEM example General Usage question	31	4375	September 4, 2019
Sparsity in DifferentialEquations: how to exploit? Modelling & Simulations sparse , differentialequation	8	761	October 15, 2021
Specifying ode solver options to speed up compute time New to Julia question	42	1943	June 2, 2021

The best linear solver for sparse bandedblockbandedmatricies

Related topics