as @stevengj suggests, your matrix may still have a lot of structure despite being dense. I have a package for H-matrices that implements a particularly simple hierarchical format called HODLR (“hierarchically off-diagonal low rank”). This of course only helps if there is structure to be exploited, and if you’re interested I (or somebody else) can expand on common sources of that structure.

On my machine (an intel i5-11600K with a completely broken BLAS-3 until openblas 3.14 comes to Fedora), in the case of a matrix that very much does have that structure, I can assemble a 60k square matrix in about 130ms. And for matrices that would actually fit in RAM, you could check that the matvec accuracy is quite good. Probably not quite `tol`

(see below), but probably better than `10*tol`

. Here’s an example:

```
using KernelMatrices, KernelMatrices.HODLR, BenchmarkTools
# this could also have the syntax kernel_entry(j, k, parameters), or
# kernel_function(ptj, ptk, parameters), where you'd construct K with a vector
# of points instead of the integer indices.
function kernel_entry(j,k)
exp(-abs(j-k)/100)
end
K = KernelMatrix(1:60_000, 1:60_000, kernel_entry)
tol = 1e-14 # ACA fact precision
maxrank = 0 # maximum rank
level = HODLR.LogLevel(8) # log2(n) - 8 dyadic 2x2 splittings
# 130 ms with a single thread on my computer. Not too bad.
# if you want a direct solve, you can also factorize this matrix,
# and assuming that the off-diagonal block rank is bounded then the
# factorization and solve will be O(n log^2 n).
@btime HK = KernelHODLR($K, $tol, $maxrank, $level, nystrom=false)
```

The package really depends on being able to conveniently get individual elements of your matrix, though. So if you have something that looks like \textbf{X} \textbf{X}^T or something, unless \textbf{X} is pretty skinny then this might not be super helpful. But maybe something to look into.