Can somebody here help me understand why in this code solving with the L in A = LL^T is an order of magnitude slower than just solving using A? Here is a MWE:
using BenchmarkTools if !(@isdefined x) # because I lazily re-run in a REPL. const A = randn(5_000, 5_000) const Mf = cholesky!(Symmetric(A'A)) end # With: # BLAS.get_num_threads() == 4 # BLAS.vendor() == :mkl # versioninfo() == (v1.6.2, official binary, etc) print("Full solve:") # 5.050 ms, 0 alloc. @btime ldiv!($Mf, x) (setup=x=randn(5_000)) print("Cholesky factor solve:") # 59.8 ms, 3 alloc, 190 MiB. @btime ldiv!($Mf.L, x) (setup=x=randn(5_000))
At larger matrix sizes, the difference is even more pronounced. For a 22k x 22k matrix, the differences was 0.1 seconds vs 1.5 seconds. Looking at the
which(LinearAlgebra.ldiv!, ...) outputs, it looks like with the triangular
Mf.L Julia is calling
trtrs!, whereas for
Mf it is calling
potrs!. I am sure that
potrs! is doing something smarter than just two sequential triangular solves, but it seems a little surprising that this
ldiv! makes big allocations and is that much slower. Intuitively, shouldn’t the solve with a triangular factor be faster, if anything?
If it matters, I have an intel CPU that does use AVX-512, and like the snippet above says, I use MKL with 4 BLAS threads. From looking around, I see other posts on the forum that say that
trtrs! is slower by a factor of two or three from other triangular solvers, but I haven’t seen this particular issue. Apologies if this is a rehash of a known issue or discussion.
Okay, there is definitely something confusing happening here. I deleted this topic because at first I think I had just made a mistake with my BenchmarkTools interpolation. When I did
@btime ldiv!($(Mf.L), x) (setup=x=randn(5000))
then the times looked reasonable. But the issue persists when I just do
const _x = randn(5000) @time Mf.L\x # 0.05 seconds @time Mf\x # 0.005 seconds
So it’s still an order of magnitude difference in this less careful benchmark. I honestly am not sure anymore if this is just a weird benchmarking issue or if
potrs! is actually causing a problem.
So many edits. Sorry everyone.