KernelMatrices.jl - A software package for working with hierarchical matrices

Hello everyone,

I would like to announce a new software package for working with and compressing kernel matrices. The package KernelMatrices.jl provides a nicely overloaded KernelMatrix struct, an implementation of the Adaptive Cross Approximation and Nystrom approximation, and a complete interface for compressing kernel matrices in the HODLR format, including factorization, solve, logdeterminant, and some other tools specific to Gaussian Processes. For maximum likelihood estimation, it provides a method to approximate MLEs with quasilinear scaling that will work with very little extra user code.

I announce this mainly to see if other people would be interested in using this software, as that will to some degree dictate how much work I put into it, both in improving the guts (like cleaning up some of the overloading that got mangled when I didn’t really update the code to 1.0 in a completely conscientious way, or thinking harder about how to use the type system, or working harder to make the code more parallelizable) and in implementing other factorizations and matrix compression structures (like HSS, H^2, skeletonization, peeling assembly methods, and so on).

I would also love to have contributions and work on this code with other people, and so if anybody out there is interested in contributing or extending the package please feel free to reach out to me at or submit a pull request or file an issue or anything at all.

I also would be very interested to hear feedback and comments about the quality of the package’s source code in general. I am always looking to improve and write better code.

Thank you for reading,
Chris Geoga (


That looks very nice! I’m not very familiar with the subject, sorry if that’s a stupid question: the package looks geared towards statistical applications, does it also work for PDEs (if I want to get quick matvecs with a green’s function) or are the techniques substantially different?

1 Like

Hi Antoine—thank you very much! I am not very familiar with methods for numerical PDEs, so I am hesitant to confidently say that this would also work for applications in that field. But I know that many innovations in hierarchical matrices came out of that field, and I can confidently say that you could very easily compress your matrix to a prescribed accuracy and then perform a matvec with this package.

Looks interesting! Note there is also HierarchicalMatrices.jl which is more focussed on (integral reformulations of) PDEs, though it is not under active development (for the moment).

1 Like

Hello Sheehan—thank you for the reference! I have seen this package, and I think it is incredibly elegant. If one just wanted a fast matvec, that package does look like a very reasonable choice. To my knowledge it implements a weakly admissible structure which is quite similar to the HODLR structure that I have implemented.

I think the primary use case for our software instead of this one would be when one wants direct solves or determinants (via a symmetric factorization). I don’t know how many people outside of statistics care about log-determinants, but I imagine (hope!) that there are plenty of other people who would be interested in direct solves instead of iterative ones.

I also would love to implement some algorithms that assemble matrices using only matvecs (like the peeling algorithm of Lin et al. 2011), which would provide more novel functionality. I’m hoping to gauge interest in how many people use julia for applications where they have fast black-box matvecs where matrix entries can’t easily be computed. The package LowRankApprox.jl seems like it would provide great guts for the functionality, so a lot of the pieces are around already…

You are right, @MikaelSlevinsky never got around to implementing solves it looks like. He did in the predecessor:

Oh wow. That is very nice code. Aside from a decent amount of functionality specific to maximum likelihood estimation in statistics, I suppose our package is just a different flavor than HierarchicalMatrices.jl then (rather than providing meaningfully different functionality). There are some small other goodies in it, like an implementation of the ACA, but nothing that requires serious R&D. I hope to also think about H^2/HSS/etc matrices, peeling, and some other matrix compression methods, which would be somewhat serious R&D I think. But none of those are presently implemented here.

The primary reason that I wrote new software rather than contributing to existing software is because, for the statistical applications we’re working on, we want the first and second derivatives of the compressed matrices. So I needed a low-rank approximation that is continuous to kernel parameters, and I needed to implement the first and second derivatives. After some looking around, and since weakly admissible matrices and factorizations are the relatively easy part to implement, I decided it would be easier to just write our own. But for numerical PDEs hierarchicalmatrices.jl may well be a better choice, as I am not aware of applications where people need to get the derivatives of the compressed kernel matrix (or for the compressed matrix to be differentiable in the first place).


Count a +1 from me :slight_smile: A typical use case would be to approximate the inverse of a differential operator (eg -Delta on some n-dimensional domain). Inverses can be very expensive to compute, but matvecs can be done.


The README examples don’t seem to work.

julia> HK_a = HODLR.KernelHODLR(K, tol, lvl, rnk, nystrom=false, plel=pll)
ERROR: MethodError: no method matching KernelMatrices.HODLR.KernelHODLR(::KernelMatrices.KernelMatrix{Float64,SArray{Tuple{2},Float64,1,2}}, ::Float64, ::KernelMatrices.HODLR.LogLevel, ::Int64; nystrom=false, plel=false)
Closest candidates are:
  KernelMatrices.HODLR.KernelHODLR(::KernelMatrices.KernelMatrix{T<:Number,A} where A, ::Float64, ::Int64, ::KernelMatrices.HODLR.HierLevel; nystrom, plel) where T<:Number at /home/chris/.julia/packages/KernelMatrices/GuD8m/src/HODLR/src/constructor.jl:9
 [1] top-level scope at REPL[39]:1

reordering the arguments, does:

julia> HK_a = HODLR.KernelHODLR(K, tol, rnk, lvl, nystrom=false, plel=pll);

same problem with the next function:

julia> HK_n = HODLR.KernelHODLR(K, tol, lvl, rnk, nystrom=true, plel=pll);
ERROR: MethodError: no method matching KernelMatrices.HODLR.KernelHODLR(::KernelMatrices.KernelMatrix{Float64,SArray{Tuple{2},Float64,1,2}}, ::Float64, ::KernelMatrices.HODLR.LogLevel, ::Int64; nystrom=true, plel=false)
Closest candidates are:
  KernelMatrices.HODLR.KernelHODLR(::KernelMatrices.KernelMatrix{T<:Number,A} where A, ::Float64, ::Int64, ::KernelMatrices.HODLR.HierLevel; nystrom, plel) where T<:Number at /home/chris/.julia/packages/KernelMatrices/GuD8m/src/HODLR/src/constructor.jl:9
 [1] top-level scope at util.jl:156

But this time, reordering arguments gives me a different error:

julia> HK_n = HODLR.KernelHODLR(K, tol, rnk, lvl, nystrom=true, plel=pll);
ERROR: BoundsError: attempt to access 0-element Array{SArray{Tuple{2},Float64,1,2},1} at index [1]
 [1] getindex(::Array{SArray{Tuple{2},Float64,1,2},1}, ::Int64) at ./array.jl:728
 [2] nystrom_uvt(::KernelMatrices.KernelMatrix{Float64,SArray{Tuple{2},Float64,1,2}}, ::KernelMatrices.NystromKernel{Float64}) at /home/chris/.julia/packages/KernelMatrices/GuD8m/src/factorizations.jl:96
 [3] (::getfield(KernelMatrices.HODLR, Symbol("##18#24")){KernelMatrices.KernelMatrix{Float64,SArray{Tuple{2},Float64,1,2}}})(::SArray{Tuple{4},Int64,1,4}) at /home/chris/.julia/packages/KernelMatrices/GuD8m/src/HODLR/src/constructor.jl:31
 [4] _collect(::Array{SArray{Tuple{4},Int64,1,4},1}, ::Base.Generator{Array{SArray{Tuple{4},Int64,1,4},1},getfield(KernelMatrices.HODLR, Symbol("##18#24")){KernelMatrices.KernelMatrix{Float64,SArray{Tuple{2},Float64,1,2}}}}, ::Base.EltypeUnknown, ::Base.HasShape{1}) at ./generator.jl:47
 [5] collect_similar at ./array.jl:548 [inlined]
 [6] map at ./abstractarray.jl:2028 [inlined]
 [7] mapf at /home/chris/.julia/packages/KernelMatrices/GuD8m/src/HODLR/src/utils.jl:14 [inlined]
 [8] #KernelHODLR#15(::Bool, ::Bool, ::Type{KernelMatrices.HODLR.KernelHODLR}, ::KernelMatrices.KernelMatrix{Float64,SArray{Tuple{2},Float64,1,2}}, ::Float64, ::Int64, ::KernelMatrices.HODLR.LogLevel) at /home/chris/.julia/packages/KernelMatrices/GuD8m/src/HODLR/src/constructor.jl:31
 [9] (::getfield(Core, Symbol("#kw#Type")))(::NamedTuple{(:nystrom, :plel),Tuple{Bool,Bool}}, ::Type{KernelMatrices.HODLR.KernelHODLR}, ::KernelMatrices.KernelMatrix{Float64,SArray{Tuple{2},Float64,1,2}}, ::Float64, ::Int64, ::KernelMatrices.HODLR.LogLevel) at ./none:0
 [10] top-level scope at util.jl:156

but this does.

My dissertation is due in a couple months, and I’m considering discussing this as a possible alternative to something I’ve done.

From glancing at the examples, it looks like there are a few things suboptimal for performance, eg

julia> @benchmark SVector{2, Float64}(randn(2))
  memory estimate:  96 bytes
  allocs estimate:  1
  minimum time:     91.302 ns (0.00% GC)
  median time:      96.838 ns (0.00% GC)
  mean time:        117.473 ns (12.09% GC)
  maximum time:     4.313 μs (96.38% GC)
  samples:          10000
  evals/sample:     956

julia> @benchmark @SVector randn(2)
  memory estimate:  0 bytes
  allocs estimate:  0
  minimum time:     25.912 ns (0.00% GC)
  median time:      27.870 ns (0.00% GC)
  mean time:        28.929 ns (0.00% GC)
  maximum time:     68.825 ns (0.00% GC)
  samples:          10000
  evals/sample:     996

I haven’t looked at the actual library code itself, but I would guess there may be a lot of needless allocations.

Hello Chris Elrod—

Thank you so much for the note, and please accept my apologies for the broken README code. I stupidly free-handed those example code snippets and was not careful to check that they ran. I will fix that immediately.

With regard to the needless allocations, I never even thought to check that the macro would be faster than the explicit type converstion. I am not super savvy with the best micro-optimizations in Julia (or any language), but I’m sure you’re right that the code could be tightened up in many places. I will take a look at that and think harder about benchmarking/profiling/optimizing the code. If you have other suggestions or thoughts or first-hand experiences to share about tightening up code of this kind, I’m very interested to hear them.

(v1.1) pkg> add

without libraries included gives an error - saying that UUID of SharedArrays is wrong. Also, could it be added to the general registry, so that it can be installed with a simple Pkg.add?

Hi Antoine—

Apologies for the trouble. I can reproduce the error, although everything worked fine for me very recently. I will look into fixing this immediately.

I also did consider adding it to the registry, but it looks like there is some tumult right now in how that works, so I was planning to wait until the dust has settled. But I suppose I need to learn about the Manifest.toml files and stuff now to fix this, so I’ll look into adding it now.

Apologies again for the issues.

Hi Antoine—

Okay, I think that the pkg> add from the bitbucket URL should work now. It’s looking to me like Registrator.jl only works for github. I’ll keep looking into it, but I’m not interested in moving the code to github, so adding it to the official registry may be hard.

Another suggestion I would make is to make supplied derivative functions a tuple of heterogeneous function types, instead of a Vector{Function}.
I imagine there are many of these calls, and their cost is going to be cheap relative to the cost of dynamic dispatches – so may as well try and cut out the dynamic dispatches.

BTW, I am citing your paper and referencing your library in my dissertation!

1 Like

I am quite interested in developments in this area. I have written a wrapper in Julia to the H2Lib hierarchical-matrix library, for acoustic scattering with a coupled finite element-boundary element method. It left something to be desired, but the promise is there.


@cgeoga Thanks, and no worries, even then it was very easy to use! Just one comment: passing of parameters like it’s implemented now feels like a matlabism. In Julia it’d be more idiomatic to use closures: KM = KernelMatrices.KernelMatrix(x,y,(x,y)->K(x,y,p)) (which are as performant, provided you are a bit careful). Also nystrom=true was much faster than false for my problem (I don’t know anything about Hmats so maybe this is trivial)

@PetrKryslUCSD cool, would you mind publishing it?

1 Like

Hi all!

@Elrod: thank you very much! Is your work available somewhere where I could see it?? With regard to the tuple suggestion, that makes a lot of sense. I actually saw you suggesting this somewhere else on the forum recently and was already thinking about it. Maybe I’m being slow here, but could I do something like that when the number of functions isn’t known at compile time? Like, I know I could do

struct MyFuns{A<:Function, B<:Function, C<:Function}
funs :: Tuple{A, B, C}

but I would guess that something like

struct MyFuns{N, A<:Function}
funs :: NTuple{N, A}

defeats the purpose, right? And I also would guess isn’t even true for closures since they’re each their own unique types. Is there some obvious way to do that that I’m missing?

@PetrKryslUCSD: I second @antoine-levitt, I would love to see that wrapper. The website for that library suggests that it can benefit from parallelizing over a very large scale, which I’m interested to experiment with. I’ve found it challenging to make this code really benefit from many more threads than my standard workstation has. Out of curiosity, when you say that it left something to be desired, do you refer to the library or to the method of hierarchical approximations?

@antoine-levitt: Nice! I’m happy to hear that it was easy to use. With regard to the parameter passing—that makes sense. I was a bit torn about that during early development. For the specific application I was working on where I’d be doing optimization over the parameters I did some lazy thing in a REPL where I convinced myself that having people pass around an extra empty Float64[] for the parameters was better than passing in a closure. But that was a while ago and I probably did everything wrong in the global scope and stuff, so I’ll look into updating this!

For the second point, unlike using the ACA, nystrom=true gives a fixed-rank approximation for off-diagonal blocks, and it builds the low-rank approximations for those blocks in a non-adaptive way. The motivation for that option is that building the matrix with that method for off-diagonal blocks (i) guarantees that the resulting approximation with be positive definite (so long as the original matrix was) and (ii) is differentiable with respect to kernel parameters. I should emphasize, though, that nystrom=true makes absolutely zero promises about any kind of precision being achieved. I sort of think about it like adding stuff to a block diagonal matrix in a way that will still for sure be positive definite and keeps at least slightly more low frequency information about the kernel. It turns out that the extra stuff may be all we need for our statistical applications…but I would guess that most people would find the sacrifice on all types of precision unacceptable! Most people’s eyes widen when I tell them I don’t control the precision at all.

@cgeoga, @antoine-levitt: I can certainly do that. I was holding off since one of the co-authors of the H2Lib is apparently enhancing and redesigning it. I had to modify the library in some minor ways ( additional accessors, the build system). So at the moment I have a wrapper of a slightly modified H2Lib, not the one you will find on the authors’ website.

Re accuracy, it seems like it works fine for the matvecs in my case. I’m only interested in hmats as preconditioners for an iterative method, so accuracy is very much not a concern - anything that gets the qualitative behavior of the long-range interaction right should be OK.

Hello @PetrKryslUCSD,

I am working on hierarchical matrices for N-body problems. I would like to play around with adaptive cross approximation. Is it possible to provide the link to your julia wrapper for H2lib?