LinearSolve.jl for many values of b?

I’m writing a package and I really like the idea of LinearSolve.jl, because I want my code to perform well and be flexible: work on CPU or GPU, be differentiable, not allocate, etc. But one of my most important use cases involves solving A\b for many b vectors. Without LinearSolve, I would just pre-compute lu(A), and then farm out ldiv!s to many threads. I’d like to do the same with LinearSolve, but it looks like the interface forbids it — set_b obviously isn’t thread safe in that sense.

To be a little more concrete for my application, A will typically have a size in the range 70-200 (squared), and there will be ~50,000 solves each time. I use machines with 100+ cores per node, so I can get silly with the threads. But I also have to repeat the whole thing (serially) many thousands of times with different As.

Is this just a case where LinearSolve was designed with very different goals in mind, making the built-in way makes more sense for my application? Maybe I should just use RecursiveFactorization.lu! for some Julianity? Maybe even TriangularSolve.jl?


Here’s a MWE reflecting what I had hoped to be able to do:

n=4
N=10
A = rand(n, n)
b = rand(n, N)
prob = LinearProblem(A, b)
linsolve = init(prob)
sol = solve(linsolve)  # Threading somehow???

The last line fails with a BoundsError as ldiv! tries to set n*N elements of a vector with n elements. On the other hand A\b does what I want.