Best way to construct 2-D Laplacian banded matrix: BandedMatrices, sparse, or BlockBandedMatrices?

Hi!

I am trying to construct the matrix associated with the 2-D Laplacian operator. To do this, I need a special kind of banded matrix. It’s best illustrated at this excellent page, and I reproduce their example matrix for a 4x4 system:

image

So basically, I need a nxn block banded matrix, whose individual blocks are size nxn. I tried constructing it using BandedMatrices.jl but this was not very fruitful – I would be left with extra entries at, for example, location (row=5,column=4) which should be empty.

I think this is exactly the use case for which BlockBandedMatrices was developed, but I’m not sure how to use its syntax. But I’m open to using any method for doing this efficiently! :grinning:

Note: In the given example, all the off-diagonal bands are -1, but they could be different from each other. So in total I have 5 distinct numbers with which to populate these bands.

4 Likes

If you are happy with sparse that’s the most straightforward, and BlockBandedMatrices.jl won’t be faster unless you use MKL (and even then there’s not much in it). But if you want to use BlockBandedMatrices.jl the easiest way is via Kron, see the finite difference example:

julia> n = 4
4

julia> D² = BandedMatrix(0 => Fill(-2,n), 1 => Fill(1,n-1), -1 => Fill(1,n-1))
4×4 BandedMatrix{Int64,Array{Int64,2},Base.OneTo{Int64}}:
 -2   1   ⋅   ⋅
  1  -2   1   ⋅
  ⋅   1  -2   1
  ⋅   ⋅   1  -2

julia> D_xx = BandedBlockBandedMatrix(Kron(D², Eye(n)))
4×4-blocked 16×16 BandedBlockBandedMatrix{Float64,BlockArrays.PseudoBlockArray{Float64,2,Array{Float64,2},BlockArrays.BlockSizes{2,Tuple{Array{Int64,1},Array{Int64,1}}}}}:
 -2.0    ⋅     ⋅     ⋅   │   1.0    ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅   -2.0    ⋅     ⋅   │    ⋅    1.0    ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅     ⋅   -2.0    ⋅   │    ⋅     ⋅    1.0    ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅     ⋅     ⋅   -2.0  │    ⋅     ⋅     ⋅    1.0  │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
 ────────────────────────┼──────────────────────────┼──────────────────────────┼────────────────────────
  1.0    ⋅     ⋅     ⋅   │  -2.0    ⋅     ⋅     ⋅   │   1.0    ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅    1.0    ⋅     ⋅   │    ⋅   -2.0    ⋅     ⋅   │    ⋅    1.0    ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅     ⋅    1.0    ⋅   │    ⋅     ⋅   -2.0    ⋅   │    ⋅     ⋅    1.0    ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅     ⋅     ⋅    1.0  │    ⋅     ⋅     ⋅   -2.0  │    ⋅     ⋅     ⋅    1.0  │    ⋅     ⋅     ⋅     ⋅ 
 ────────────────────────┼──────────────────────────┼──────────────────────────┼────────────────────────
   ⋅     ⋅     ⋅     ⋅   │   1.0    ⋅     ⋅     ⋅   │  -2.0    ⋅     ⋅     ⋅   │   1.0    ⋅     ⋅     ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅    1.0    ⋅     ⋅   │    ⋅   -2.0    ⋅     ⋅   │    ⋅    1.0    ⋅     ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅    1.0    ⋅   │    ⋅     ⋅   -2.0    ⋅   │    ⋅     ⋅    1.0    ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅    1.0  │    ⋅     ⋅     ⋅   -2.0  │    ⋅     ⋅     ⋅    1.0
 ────────────────────────┼──────────────────────────┼──────────────────────────┼────────────────────────
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │   1.0    ⋅     ⋅     ⋅   │  -2.0    ⋅     ⋅     ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅    1.0    ⋅     ⋅   │    ⋅   -2.0    ⋅     ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅    1.0    ⋅   │    ⋅     ⋅   -2.0    ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅    1.0  │    ⋅     ⋅     ⋅   -2.0

julia> D_yy = BandedBlockBandedMatrix(Kron(Eye(n), D²))
4×4-blocked 16×16 BandedBlockBandedMatrix{Float64,BlockArrays.PseudoBlockArray{Float64,2,Array{Float64,2},BlockArrays.BlockSizes{2,Tuple{Array{Int64,1},Array{Int64,1}}}}}:
 -2.0   1.0    ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
  1.0  -2.0   1.0    ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅    1.0  -2.0   1.0  │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅     ⋅    1.0  -2.0  │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
 ────────────────────────┼──────────────────────────┼──────────────────────────┼────────────────────────
   ⋅     ⋅     ⋅     ⋅   │  -2.0   1.0    ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅     ⋅     ⋅     ⋅   │   1.0  -2.0   1.0    ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅    1.0  -2.0   1.0  │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅    1.0  -2.0  │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
 ────────────────────────┼──────────────────────────┼──────────────────────────┼────────────────────────
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │  -2.0   1.0    ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │   1.0  -2.0   1.0    ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅    1.0  -2.0   1.0  │    ⋅     ⋅     ⋅     ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅    1.0  -2.0  │    ⋅     ⋅     ⋅     ⋅ 
 ────────────────────────┼──────────────────────────┼──────────────────────────┼────────────────────────
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │  -2.0   1.0    ⋅     ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │   1.0  -2.0   1.0    ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅    1.0  -2.0   1.0
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅    1.0  -2.0

julia> Δ = D_xx + D_yy
4×4-blocked 16×16 BandedBlockBandedMatrix{Float64,BlockArrays.PseudoBlockArray{Float64,2,Array{Float64,2},BlockArrays.BlockSizes{2,Tuple{Array{Int64,1},Array{Int64,1}}}}}:
 -4.0   1.0    ⋅     ⋅   │   1.0   0.0    ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
  1.0  -4.0   1.0    ⋅   │   0.0   1.0   0.0    ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅    1.0  -4.0   1.0  │    ⋅    0.0   1.0   0.0  │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅     ⋅    1.0  -4.0  │    ⋅     ⋅    0.0   1.0  │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
 ────────────────────────┼──────────────────────────┼──────────────────────────┼────────────────────────
  1.0   0.0    ⋅     ⋅   │  -4.0   1.0    ⋅     ⋅   │   1.0   0.0    ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
  0.0   1.0   0.0    ⋅   │   1.0  -4.0   1.0    ⋅   │   0.0   1.0   0.0    ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅    0.0   1.0   0.0  │    ⋅    1.0  -4.0   1.0  │    ⋅    0.0   1.0   0.0  │    ⋅     ⋅     ⋅     ⋅ 
   ⋅     ⋅    0.0   1.0  │    ⋅     ⋅    1.0  -4.0  │    ⋅     ⋅    0.0   1.0  │    ⋅     ⋅     ⋅     ⋅ 
 ────────────────────────┼──────────────────────────┼──────────────────────────┼────────────────────────
   ⋅     ⋅     ⋅     ⋅   │   1.0   0.0    ⋅     ⋅   │  -4.0   1.0    ⋅     ⋅   │   1.0   0.0    ⋅     ⋅ 
   ⋅     ⋅     ⋅     ⋅   │   0.0   1.0   0.0    ⋅   │   1.0  -4.0   1.0    ⋅   │   0.0   1.0   0.0    ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅    0.0   1.0   0.0  │    ⋅    1.0  -4.0   1.0  │    ⋅    0.0   1.0   0.0
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅    0.0   1.0  │    ⋅     ⋅    1.0  -4.0  │    ⋅     ⋅    0.0   1.0
 ────────────────────────┼──────────────────────────┼──────────────────────────┼────────────────────────
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │   1.0   0.0    ⋅     ⋅   │  -4.0   1.0    ⋅     ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │   0.0   1.0   0.0    ⋅   │   1.0  -4.0   1.0    ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅    0.0   1.0   0.0  │    ⋅    1.0  -4.0   1.0
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅    0.0   1.0  │    ⋅     ⋅    1.0  -4.0

Note BlockBandedMatrices.jl has other advantages over sparse apart from speed: it is possible to have it on the GPU, parallelised, and works with general Julia types. Also the QR factorization is a lot faster than sparse.

3 Likes

If you prefer more declaritive you can construct it using Block:

julia> Δ = BandedBlockBandedMatrix(Zeros(n^2,n^2), (Fill(n,n),Fill(n,n)), (1,1), (1,1))
4×4-blocked 16×16 BandedBlockBandedMatrix{Float64,BlockArrays.PseudoBlockArray{Float64,2,Array{Float64,2},BlockArrays.BlockSizes{2,Tuple{Array{Int64,1},Array{Int64,1}}}}}:
 0.0  0.0   ⋅    ⋅   │  0.0  0.0   ⋅    ⋅   │   ⋅    ⋅    ⋅    ⋅   │   ⋅    ⋅    ⋅    ⋅ 
 0.0  0.0  0.0   ⋅   │  0.0  0.0  0.0   ⋅   │   ⋅    ⋅    ⋅    ⋅   │   ⋅    ⋅    ⋅    ⋅ 
  ⋅   0.0  0.0  0.0  │   ⋅   0.0  0.0  0.0  │   ⋅    ⋅    ⋅    ⋅   │   ⋅    ⋅    ⋅    ⋅ 
  ⋅    ⋅   0.0  0.0  │   ⋅    ⋅   0.0  0.0  │   ⋅    ⋅    ⋅    ⋅   │   ⋅    ⋅    ⋅    ⋅ 
 ────────────────────┼──────────────────────┼──────────────────────┼────────────────────
 0.0  0.0   ⋅    ⋅   │  0.0  0.0   ⋅    ⋅   │  0.0  0.0   ⋅    ⋅   │   ⋅    ⋅    ⋅    ⋅ 
 0.0  0.0  0.0   ⋅   │  0.0  0.0  0.0   ⋅   │  0.0  0.0  0.0   ⋅   │   ⋅    ⋅    ⋅    ⋅ 
  ⋅   0.0  0.0  0.0  │   ⋅   0.0  0.0  0.0  │   ⋅   0.0  0.0  0.0  │   ⋅    ⋅    ⋅    ⋅ 
  ⋅    ⋅   0.0  0.0  │   ⋅    ⋅   0.0  0.0  │   ⋅    ⋅   0.0  0.0  │   ⋅    ⋅    ⋅    ⋅ 
 ────────────────────┼──────────────────────┼──────────────────────┼────────────────────
  ⋅    ⋅    ⋅    ⋅   │  0.0  0.0   ⋅    ⋅   │  0.0  0.0   ⋅    ⋅   │  0.0  0.0   ⋅    ⋅ 
  ⋅    ⋅    ⋅    ⋅   │  0.0  0.0  0.0   ⋅   │  0.0  0.0  0.0   ⋅   │  0.0  0.0  0.0   ⋅ 
  ⋅    ⋅    ⋅    ⋅   │   ⋅   0.0  0.0  0.0  │   ⋅   0.0  0.0  0.0  │   ⋅   0.0  0.0  0.0
  ⋅    ⋅    ⋅    ⋅   │   ⋅    ⋅   0.0  0.0  │   ⋅    ⋅   0.0  0.0  │   ⋅    ⋅   0.0  0.0
 ────────────────────┼──────────────────────┼──────────────────────┼────────────────────
  ⋅    ⋅    ⋅    ⋅   │   ⋅    ⋅    ⋅    ⋅   │  0.0  0.0   ⋅    ⋅   │  0.0  0.0   ⋅    ⋅ 
  ⋅    ⋅    ⋅    ⋅   │   ⋅    ⋅    ⋅    ⋅   │  0.0  0.0  0.0   ⋅   │  0.0  0.0  0.0   ⋅ 
  ⋅    ⋅    ⋅    ⋅   │   ⋅    ⋅    ⋅    ⋅   │   ⋅   0.0  0.0  0.0  │   ⋅   0.0  0.0  0.0
  ⋅    ⋅    ⋅    ⋅   │   ⋅    ⋅    ⋅    ⋅   │   ⋅    ⋅   0.0  0.0  │   ⋅    ⋅   0.0  0.0

julia> for K = 1:n
       view(Δ,Block(K,K))[band(0)] .= -4; view(Δ,Block(K,K))[band(1)] .= 1; view(Δ,Block(K,K))[band(-1)] .= 1;
       end

julia> for K = 1:n-1
       view(Δ,Block(K,K+1))[band(0)] .= 1; view(Δ,Block(K+1,K))[band(0)] .= 1;
       end

julia> Δ
4×4-blocked 16×16 BandedBlockBandedMatrix{Float64,BlockArrays.PseudoBlockArray{Float64,2,Array{Float64,2},BlockArrays.BlockSizes{2,Tuple{Array{Int64,1},Array{Int64,1}}}}}:
 -4.0   1.0    ⋅     ⋅   │   1.0   0.0    ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
  1.0  -4.0   1.0    ⋅   │   0.0   1.0   0.0    ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅    1.0  -4.0   1.0  │    ⋅    0.0   1.0   0.0  │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅     ⋅    1.0  -4.0  │    ⋅     ⋅    0.0   1.0  │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
 ────────────────────────┼──────────────────────────┼──────────────────────────┼────────────────────────
  1.0   0.0    ⋅     ⋅   │  -4.0   1.0    ⋅     ⋅   │   1.0   0.0    ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅ 
  0.0   1.0   0.0    ⋅   │   1.0  -4.0   1.0    ⋅   │   0.0   1.0   0.0    ⋅   │    ⋅     ⋅     ⋅     ⋅ 
   ⋅    0.0   1.0   0.0  │    ⋅    1.0  -4.0   1.0  │    ⋅    0.0   1.0   0.0  │    ⋅     ⋅     ⋅     ⋅ 
   ⋅     ⋅    0.0   1.0  │    ⋅     ⋅    1.0  -4.0  │    ⋅     ⋅    0.0   1.0  │    ⋅     ⋅     ⋅     ⋅ 
 ────────────────────────┼──────────────────────────┼──────────────────────────┼────────────────────────
   ⋅     ⋅     ⋅     ⋅   │   1.0   0.0    ⋅     ⋅   │  -4.0   1.0    ⋅     ⋅   │   1.0   0.0    ⋅     ⋅ 
   ⋅     ⋅     ⋅     ⋅   │   0.0   1.0   0.0    ⋅   │   1.0  -4.0   1.0    ⋅   │   0.0   1.0   0.0    ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅    0.0   1.0   0.0  │    ⋅    1.0  -4.0   1.0  │    ⋅    0.0   1.0   0.0
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅    0.0   1.0  │    ⋅     ⋅    1.0  -4.0  │    ⋅     ⋅    0.0   1.0
 ────────────────────────┼──────────────────────────┼──────────────────────────┼────────────────────────
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │   1.0   0.0    ⋅     ⋅   │  -4.0   1.0    ⋅     ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │   0.0   1.0   0.0    ⋅   │   1.0  -4.0   1.0    ⋅ 
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅    0.0   1.0   0.0  │    ⋅    1.0  -4.0   1.0
   ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅     ⋅     ⋅   │    ⋅     ⋅    0.0   1.0  │    ⋅     ⋅    1.0  -4.0
4 Likes

FWIW, DiffEqOperators.jl isn’t done yet, but it’s getting close. I hope that in a few months this should be:

using DiffEqOperators
N = 64
dx = 0.1
Dxx = CenteredDifference(2,2,dx,N)
Dyy = CenteredDifference{2}(2,2,dx,N)
L = Dxx + Dyy
using BlockBandedMatrices
BlockBandedMatrix(L)

Right now you can make the banded matrices and manually add them, but this is on our minds right now. :man_shrugging:

10 Likes

Thanks! kron is a good idea.

One small thing – the type that I get is:
1×1-blocked 16×16 BandedBlockBandedMatrix{Float64,BlockArrays.PseudoBlockArray{Float64,2,Array{Float64,2},BlockArrays.BlockSizes{2,Tuple{Array{Int64,1},Array{Int64,1}}}}}

not the type that your code seems to generate.

Wow! this would be fantastic!

Also, how do I use sparse to do this? The documentation on sparse in the official docs is even more sparse…

I’m horrible at reading types, but it looks the same to me? If not, it may be because I’m on a development version of the relevant packages.

Also, how do I use sparse to do this?

The fast way of building sparse matrices is to specify the indices/values as vectors. It also supports kron, and so the following is probably the easiest equivalent:

julia> Is = [1:n; 1:n-1; 2:n]; Js = [1:n; 2:n; 1:n-1]; Vs = [fill(-2,n); fill(1, 2n-2)];

julia> D² = sparse(Is, Js, Vs)
4×4 SparseMatrixCSC{Int64,Int64} with 10 stored entries:
  [1, 1]  =  -2
  [2, 1]  =  1
  [1, 2]  =  1
  [2, 2]  =  -2
  [3, 2]  =  1
  [2, 3]  =  1
  [3, 3]  =  -2
  [4, 3]  =  1
  [3, 4]  =  1
  [4, 4]  =  -2

julia> D_xx = kron(D², sparse(I,n,n));

julia> D_yy = kron(sparse(I,n,n), D²);

julia> Δ = D_xx + D_yy
16×16 SparseMatrixCSC{Int64,Int64} with 64 stored entries:
  [1 ,  1]  =  -4
  [2 ,  1]  =  1
  [5 ,  1]  =  1
  [1 ,  2]  =  1
  [2 ,  2]  =  -4
  [3 ,  2]  =  1
  [6 ,  2]  =  1
  [2 ,  3]  =  1
  [3 ,  3]  =  -4
  [4 ,  3]  =  1
  [7 ,  3]  =  1
  [3 ,  4]  =  1
  [4 ,  4]  =  -4
  [8 ,  4]  =  1
  [1 ,  5]  =  1
  [5 ,  5]  =  -4
  [6 ,  5]  =  1
  [9 ,  5]  =  1
  [2 ,  6]  =  1
  [5 ,  6]  =  1
  [6 ,  6]  =  -4
  [7 ,  6]  =  1
  [10,  6]  =  1
  [3 ,  7]  =  1
  [6 ,  7]  =  1
  [7 ,  7]  =  -4
  [8 ,  7]  =  1
  ⋮
  [9 , 10]  =  1
  [10, 10]  =  -4
  [11, 10]  =  1
  [14, 10]  =  1
  [7 , 11]  =  1
  [10, 11]  =  1
  [11, 11]  =  -4
  [12, 11]  =  1
  [15, 11]  =  1
  [8 , 12]  =  1
  [11, 12]  =  1
  [12, 12]  =  -4
  [16, 12]  =  1
  [9 , 13]  =  1
  [13, 13]  =  -4
  [14, 13]  =  1
  [10, 14]  =  1
  [13, 14]  =  1
  [14, 14]  =  -4
  [15, 14]  =  1
  [11, 15]  =  1
  [14, 15]  =  1
  [15, 15]  =  -4
  [16, 15]  =  1
  [12, 16]  =  1
  [15, 16]  =  1
  [16, 16]  =  -4

julia> Matrix(Δ) # Convert to matrix for readable display
16×16 Array{Int64,2}:
 -4   1   0   0   1   0   0   0   0   0   0   0   0   0   0   0
  1  -4   1   0   0   1   0   0   0   0   0   0   0   0   0   0
  0   1  -4   1   0   0   1   0   0   0   0   0   0   0   0   0
  0   0   1  -4   0   0   0   1   0   0   0   0   0   0   0   0
  1   0   0   0  -4   1   0   0   1   0   0   0   0   0   0   0
  0   1   0   0   1  -4   1   0   0   1   0   0   0   0   0   0
  0   0   1   0   0   1  -4   1   0   0   1   0   0   0   0   0
  0   0   0   1   0   0   1  -4   0   0   0   1   0   0   0   0
  0   0   0   0   1   0   0   0  -4   1   0   0   1   0   0   0
  0   0   0   0   0   1   0   0   1  -4   1   0   0   1   0   0
  0   0   0   0   0   0   1   0   0   1  -4   1   0   0   1   0
  0   0   0   0   0   0   0   1   0   0   1  -4   0   0   0   1
  0   0   0   0   0   0   0   0   1   0   0   0  -4   1   0   0
  0   0   0   0   0   0   0   0   0   1   0   0   1  -4   1   0
  0   0   0   0   0   0   0   0   0   0   1   0   0   1  -4   1
  0   0   0   0   0   0   0   0   0   0   0   1   0   0   1  -4

BandedBlockBandedMatrix is probably faster for building the operators since sparse needs to do sorting as part of construction.

1 Like

I dont understand this statement. Do you mean that BlockBandedMatrices is on par with sparse for the linsolve or just for the construction of Dxx?

julia> @time D_xx \ rhs;
  0.262197 seconds (110.03 k allocations: 79.279 MiB, 5.77% gc time)
julia> Ds = sparse(D_xx);
julia> @time Ds \ rhs;
  0.006213 seconds (65 allocations: 5.398 MiB)

Oh I was referring to Matrix * vector. There’s no such thing as a BandedBlockBandedMatrix solver so it falls back to BlockBandedMatrix QR.

But those timings are a performance regression: it’s hitting a generic fallback which is really slow. Once that’s fixed it should be comparable to sparse for matrices less than size (1_000_000,1_000_000) or so.

I fixed the bug. Also D_xx is too sparse so sparse gets O(N) complexity for N unknowns. You need to try with Δ to see a fair comparison. On OpenBLAS they are comparable (here n = 100):

julia> @time F = qr(Δ);
  0.215902 seconds (91.61 k allocations: 73.517 MiB, 7.04% gc time)

julia> @time Fs = qr(Δs);
  0.143997 seconds (10.14 k allocations: 75.373 MiB, 8.67% gc time)

julia> @time F \ rhs;
  0.027145 seconds (107.03 k allocations: 12.533 MiB, 26.28% gc time)

julia> @time Fs \ rhs;
  0.025400 seconds (52.60 k allocations: 17.218 MiB, 59.16% gc time)

I believe with MKL BlockBandedMatrices.jl would be 5x faster than `sparse.

The more significant issue is memory usage of sparse depends on the entries, and breaks down for singularly perturbed problems. I’ve been meaning to write a blog post on this for a while.

1 Like

When you say this, do you mean that we just need to compile MKL in (e.g. using MKL.jl ) or are there other tricks required (e.g. do we need GitHub - JuliaSparse/MKLSparse.jl: Make available to Julia the sparse functionality in MKL and other stuff)?

Boy would I be happy if it was easier to replace OpenBLAS. i.e. saying “yeah but matlab/python is only faster because it is using MKL instead of OpenBLAS” doesn’t really cut it when Julia is advertised as being high performance.

1 Like

Just compile with MKL. Here most the time is in LAPACK.omqr! (or something, basically applying a dense orthogonal matrix)

Though my claims of speed up are more applicable to banded matrix BLAS, where this is all dense BLAS (on the blocks, not the whole matrix, since it treats the blocks as dense).

1 Like

This is interesting!

Do you think CUDA could speed this up any further?

My use case (implicit FVM) is to do (I+C)\rhs where C is a convection operator with bands similar to this post. The issue for me is that C changes at each iteration and just building the sparse matrix is 1/3 of the time of each step (and then 2 linsolves for C1 and C2).

I know I have been waiting for the 2d version of your blog post :wink:

1 Like

Not an expert on CUDA but I believe it will help with build time, not with solve time. Though if you only have matrix * vector and triangular solves, you might be able to go full lazy and have no build time at all…

1 Like

Hello Chris, is this (or something like this) possible now?

Hello again. It seems that this package has changed considerably, and I was wondering if you or someone else on this forum could help me figure out how to translate my code from a year ago so that it works with the latest version of BlockBandedMatrices.jl.

Firstly, I constructed an empty matrix like this (assume nx = ny, Δx = Δy)

nx, ny = size(ψ)
Δx = Δy = 1/(nx-1); # length of 1.0 in both directions
Ae = Aw = An = As = 1
Ap = -4
A = BandedBlockBandedMatrix(Zeros(nx*ny,nx*ny), (Fill(nx,ny),Fill(nx,ny)), (1,1), (1,1))

I believe that the new way of doing this, is by using something like this:

nx, ny = size(ψ)
Δx = Δy = 1/(nx-1); # length of 1.0 in both directions
rows = cols = 1:nx
l = u = λ = μ = 1
Ae = Aw = An = As = 1
Ap = -4
A = BandedBlockBandedMatrix(Zeros(sum(rows),sum(cols)), rows, cols, (l,u), (λ,μ))

This works, I think.

Now, I used to modify this matrix with code of the following type

#### Fill up the main diagonal block
for K in 2:nx-1
    view(A,Block(K,K))[band(0)] .= Ap.*Δt./(2*Re*Δx^2);
    view(A,Block(K,K))[band(-1)] .= Ae.*Δt./(2*Re*Δx^2);
    view(A,Block(K,K))[band(1)] .= Aw.*Δt./(2*Re*Δx^2);
end
#### Fill up off-diagonal blocks
for K = 2:ny-1
    view(A,Block(K,K+1))[band(0)] .= As.*Δt./(2*Re*Δx^2);
end
for K = 1:ny-2
    view(A,Block(K+1,K))[band(0)] .= An.*Δt./(2*Re*Δx^2);
end

but it seems like view(A,Block(K,K))[band(0)] is no longer an acceptable syntax. Is there a way to fill in the ‘bands’ of each ‘block’ in this way?

Later, I modified this matrix for boundary conditions using something like this:

for K in [1,nx]
    view(A,Block(K,K))[band(0)] .= 1
end
for K in 1:nx-2
# Here we re-define coefficients to enforce Dirichlet boundary conditions on the east
   rownum = K*ny+1
   ind1 = K*nx+1
   ind2 = K*nx+2
   ind3 = (K-1)*nx+1
   ind4 = (K+1)*nx+1
   setindex!(A,1,rownum,ind1)
   setindex!(A,0,rownum,ind2) 
   setindex!(A,0,rownum,ind3)
   setindex!(A,0,rownum,ind4)
end

Would these methods still work?

If there is any documentation that will help me translate my code from a year-old version of this package to the current version, I’d love to take a look at it and try to figure this out on my own. Grateful for any help/direction you can offer me!

That should work. The right place to ask this is as a github issue including the error messages

EDIT: just needs using BandedMatrices

1 Like

Just adding one more of those as sparse Matrix using LinearIndices which perhaps translates more nicely into 3d if someone needs that.

"""
Graph Laplacian of a `m×n` lattice.
"""
function gridlaplacian(T, m, n)
    S = sparse(T(0.0)I, n*m, n*m)
    linear = LinearIndices((1:m, 1:n))
    for i in 1:m
        for j in 1:n
            for (i2, j2) in ((i + 1, j), (i, j + 1))
                if i2 <= m && j2 <= n
                    S[linear[i, j], linear[i2, j2]] -= 1
                    S[linear[i2, j2], linear[i, j]] -= 1

                    S[linear[i, j], linear[i, j]] += 1
                    S[linear[i2, j2], linear[i2, j2]] += 1
                end
            end
        end
    end
    S
end

It should work: https://github.com/SciML/DiffEqOperators.jl/blob/master/src/derivative_operators/concretization.jl#L415-L431

1 Like