I am trying to construct the matrix associated with the 2-D Laplacian operator. To do this, I need a special kind of banded matrix. It’s best illustrated at this excellent page, and I reproduce their example matrix for a 4x4 system:

So basically, I need a nxn block banded matrix, whose individual blocks are size nxn. I tried constructing it using BandedMatrices.jl but this was not very fruitful – I would be left with extra entries at, for example, location (row=5,column=4) which should be empty.

I think this is exactly the use case for which BlockBandedMatrices was developed, but I’m not sure how to use its syntax. But I’m open to using any method for doing this efficiently!

Note: In the given example, all the off-diagonal bands are -1, but they could be different from each other. So in total I have 5 distinct numbers with which to populate these bands.

If you are happy with sparse that’s the most straightforward, and BlockBandedMatrices.jl won’t be faster unless you use MKL (and even then there’s not much in it). But if you want to use BlockBandedMatrices.jl the easiest way is via Kron, see the finite difference example:

Note BlockBandedMatrices.jl has other advantages over sparse apart from speed: it is possible to have it on the GPU, parallelised, and works with general Julia types. Also the QR factorization is a lot faster than sparse.

One small thing – the type that I get is: 1×1-blocked 16×16 BandedBlockBandedMatrix{Float64,BlockArrays.PseudoBlockArray{Float64,2,Array{Float64,2},BlockArrays.BlockSizes{2,Tuple{Array{Int64,1},Array{Int64,1}}}}}

I’m horrible at reading types, but it looks the same to me? If not, it may be because I’m on a development version of the relevant packages.

Also, how do I use sparse to do this?

The fast way of building sparse matrices is to specify the indices/values as vectors. It also supports kron, and so the following is probably the easiest equivalent:

Oh I was referring to Matrix * vector. There’s no such thing as a BandedBlockBandedMatrix solver so it falls back to BlockBandedMatrix QR.

But those timings are a performance regression: it’s hitting a generic fallback which is really slow. Once that’s fixed it should be comparable to sparse for matrices less than size (1_000_000,1_000_000) or so.

I fixed the bug. Also D_xx is too sparse so sparse gets O(N) complexity for N unknowns. You need to try with Δ to see a fair comparison. On OpenBLAS they are comparable (here n = 100):

julia> @time F = qr(Δ);
0.215902 seconds (91.61 k allocations: 73.517 MiB, 7.04% gc time)
julia> @time Fs = qr(Δs);
0.143997 seconds (10.14 k allocations: 75.373 MiB, 8.67% gc time)
julia> @time F \ rhs;
0.027145 seconds (107.03 k allocations: 12.533 MiB, 26.28% gc time)
julia> @time Fs \ rhs;
0.025400 seconds (52.60 k allocations: 17.218 MiB, 59.16% gc time)

I believe with MKL BlockBandedMatrices.jl would be 5x faster than `sparse.

The more significant issue is memory usage of sparse depends on the entries, and breaks down for singularly perturbed problems. I’ve been meaning to write a blog post on this for a while.

When you say this, do you mean that we just need to compile MKL in (e.g. using MKL.jl ) or are there other tricks required (e.g. do we need https://github.com/JuliaSparse/MKLSparse.jl and other stuff)?

Boy would I be happy if it was easier to replace OpenBLAS. i.e. saying “yeah but matlab/python is only faster because it is using MKL instead of OpenBLAS” doesn’t really cut it when Julia is advertised as being high performance.

Just compile with MKL. Here most the time is in LAPACK.omqr! (or something, basically applying a dense orthogonal matrix)

Though my claims of speed up are more applicable to banded matrix BLAS, where this is all dense BLAS (on the blocks, not the whole matrix, since it treats the blocks as dense).

Do you think CUDA could speed this up any further?

My use case (implicit FVM) is to do (I+C)\rhs where C is a convection operator with bands similar to this post. The issue for me is that C changes at each iteration and just building the sparse matrix is 1/3 of the time of each step (and then 2 linsolves for C1 and C2).

I know I have been waiting for the 2d version of your blog post

Not an expert on CUDA but I believe it will help with build time, not with solve time. Though if you only have matrix * vector and triangular solves, you might be able to go full lazy and have no build time at all…

Hello again. It seems that this package has changed considerably, and I was wondering if you or someone else on this forum could help me figure out how to translate my code from a year ago so that it works with the latest version of BlockBandedMatrices.jl.

Firstly, I constructed an empty matrix like this (assume nx = ny, Δx = Δy)

nx, ny = size(ψ)
Δx = Δy = 1/(nx-1); # length of 1.0 in both directions
Ae = Aw = An = As = 1
Ap = -4
A = BandedBlockBandedMatrix(Zeros(nx*ny,nx*ny), (Fill(nx,ny),Fill(nx,ny)), (1,1), (1,1))

I believe that the new way of doing this, is by using something like this:

nx, ny = size(ψ)
Δx = Δy = 1/(nx-1); # length of 1.0 in both directions
rows = cols = 1:nx
l = u = λ = μ = 1
Ae = Aw = An = As = 1
Ap = -4
A = BandedBlockBandedMatrix(Zeros(sum(rows),sum(cols)), rows, cols, (l,u), (λ,μ))

This works, I think.

Now, I used to modify this matrix with code of the following type

#### Fill up the main diagonal block
for K in 2:nx-1
view(A,Block(K,K))[band(0)] .= Ap.*Δt./(2*Re*Δx^2);
view(A,Block(K,K))[band(-1)] .= Ae.*Δt./(2*Re*Δx^2);
view(A,Block(K,K))[band(1)] .= Aw.*Δt./(2*Re*Δx^2);
end
#### Fill up off-diagonal blocks
for K = 2:ny-1
view(A,Block(K,K+1))[band(0)] .= As.*Δt./(2*Re*Δx^2);
end
for K = 1:ny-2
view(A,Block(K+1,K))[band(0)] .= An.*Δt./(2*Re*Δx^2);
end

but it seems like view(A,Block(K,K))[band(0)] is no longer an acceptable syntax. Is there a way to fill in the ‘bands’ of each ‘block’ in this way?

Later, I modified this matrix for boundary conditions using something like this:

for K in [1,nx]
view(A,Block(K,K))[band(0)] .= 1
end
for K in 1:nx-2
# Here we re-define coefficients to enforce Dirichlet boundary conditions on the east
rownum = K*ny+1
ind1 = K*nx+1
ind2 = K*nx+2
ind3 = (K-1)*nx+1
ind4 = (K+1)*nx+1
setindex!(A,1,rownum,ind1)
setindex!(A,0,rownum,ind2)
setindex!(A,0,rownum,ind3)
setindex!(A,0,rownum,ind4)
end

Would these methods still work?

If there is any documentation that will help me translate my code from a year-old version of this package to the current version, I’d love to take a look at it and try to figure this out on my own. Grateful for any help/direction you can offer me!

Just adding one more of those as sparse Matrix using LinearIndices which perhaps translates more nicely into 3d if someone needs that.

"""
Graph Laplacian of a `m×n` lattice.
"""
function gridlaplacian(T, m, n)
S = sparse(T(0.0)I, n*m, n*m)
linear = LinearIndices((1:m, 1:n))
for i in 1:m
for j in 1:n
for (i2, j2) in ((i + 1, j), (i, j + 1))
if i2 <= m && j2 <= n
S[linear[i, j], linear[i2, j2]] -= 1
S[linear[i2, j2], linear[i, j]] -= 1
S[linear[i, j], linear[i, j]] += 1
S[linear[i2, j2], linear[i2, j2]] += 1
end
end
end
end
S
end