[ANN] `HMatrices.jl` - A package for assembling and factoring hierarchical matrices

Hi,

I recently registered HMatrices.jl, a package for working with hierarchical matrices that may be of interest to others. More specifically, it implements the ℋ-matrix format, with a strong focus on the tools needed to solve boundary integral equations. It provides both an accelerated matrix-vector product, as well as a hierarchical LU factorisation. There are also a variety of clustering strategies to choose from, and the flexibility to define your own kernel, admissibility condition, etc.

Most of the code has been written with performance in mind, so I hope it is not too slow :sweat_smile:. Currently, there is support for shared memory parallelism for the assembling part, the matrix-vector product, and the factorisation, and a very experimental (partial) support for distributed memory parallelism. There are some half-finished docs as well that should cover the most common use cases.

Feel free to reach out if you have to any suggestions, requests, or general feedback regarding the package.

Thanks,
Luiz M. Faria

p.s. If you are interested in hierarchical matrices, you may also want to check out the following packages:

17 Likes

It is awesome :heart_eyes:!

1 Like

Wow. This is incredible.

using StaticArrays, NearestNeighbors, BenchmarkTools
import HMatrices, LinearAlgebra, KernelMatrices

const _pts = [@SVector rand(3) for _ in 1:10_000];
const pts_tre = KDTree(_pts).data

@inline kernel(ptx, pty) = exp(-norm(ptx-pty))

const hm_km = HMatrices.KernelMatrix(kernel, pts_tre, pts_tre)
const km_km = KernelMatrices.KernelMatrix(pts_tre, pts_tre, kernel)

function assemble_hm()
  HMatrices.assemble_hmat(hm_km; atol=1e-8)
end

function assemble_km()
  KernelMatrices.HODLR.KernelHODLR(km_km, 1e-8, 0, KernelMatrices.HODLR.LogLevel(8),
                                   nystrom=false, plel=false)
end


@btime assemble_hm() # 1.879 s  (767555 allocations: 1.27 GiB)
@btime assemble_km() # 15.278 s (34838589 allocations: 71.17 GiB)

Congrats on putting something like this together. I’ve also been waiting for somebody to write a better ACA than what I have in KM for a long time now, and this really does not disappoint.

What is your roadmap for features? When I was writing KM I was really laser-focused on positive definite matrices, and I needed a log-determinant so they really had to be SPD (for me, done at the cost of approximation accuracy/compression efficiency). Do you plan to implement anything that promises positive-definiteness, maybe like the Schur compensation method of Xia and Gu or something? I know that method is specifically for HSS matrices, but I would guess that some part of the idea could still be relevant.

3 Likes

Thanks for the feedback! Keep in mind that HMatrices uses threads by default, so the comparison above may not be totally fair to your code. You can turn multithreading off in HMatrices by passing the keyword argument threads=false to assemble_hmat.

Also, did you check the accuracy of the approximations? I tried adding the following to your script above, but the error for KernelMatrix was significantly larger than the prescribed tolerance:

x   = rand(10_000)
ref = hm_km*x

@show norm(assemble_hm()*x - ref,Inf)
@show norm(assemble_km()*x - ref,Inf)

Perhaps I have done something wrong?

As for the roadmap, here are a few thinks I have in mind:

  • Proper support for distributed assembling and matrix-vector product. Things are now at a very experimental stage.
  • Improve the (shared memory) parallelisation of the hierarchical LU
  • Other compression techniques (i.e. other than ACA)

Unfortunately, SPD matrices do not appear too often in my research, so unless someone is willing to help out, I am unlikely to focus on that for the time being.

The main problem I have with hierarchical matrices is that they have a large memory footprint (I am talking about the method, not any specific package). So I may also focus on implementing an FMM like method… but that is probably another package :slight_smile:

1 Like

From the @info output I get, it looks like I’m using HMatrices with just one thread, so I think your code really is just that much faster. With regard to error, I see an inf norm of 1e-5 with KernelMatrices and 4e-8 with HMatrices.jl, so if that’s also what you’re seeing then I can verify. Don’t hold me to this, but I would guess the explanation is that our ACAs would give you back basically the same thing if given the same matrix blocks, but the H-format you use is just better than the HODLR one, which I again picked for some reasons specific to my applications (like getting the logdet).

I like your plan with the package a lot! I don’t blame you about not planning stuff for specific SPD applications. Honestly I have sort of moved away from H-matrix stuff for SPD matrices (specifically in my field of spatio-temporal statistics/Gaussian processes) because it just sort of seems like the wrong tool when you absolutely need the approximation to be SPD (and I also need it to be differentiable). It’s a painful requirement to enforce.

In any case, will definitely keep an eye out on your package if for no other reason than intrinsic interest! It really is impressive.

Yes, that is what I see, thanks for confirming!