I’ve faced that ldiv
for Cholesky
is unexpectedly slower than two substitutions.
Here is a test problem with a random 1000-by-1000 Matrix{Float64}
.
julia> using BenchmarkTools, LinearAlgebra
julia> n = 1000; A = rand(n, n); A = A' * A; b = rand(n);
Results for ldiv
on factorization object.
julia> Achol = cholesky(A);
julia> @benchmark $Achol \ $b
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 315.875 μs … 531.959 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 317.833 μs ┊ GC (median): 0.00%
Time (mean ± σ): 319.034 μs ± 4.421 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▄██▇▇▇
▃██████▇▃▁▂▄▆▅▇█▆▄▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
316 μs Histogram: frequency by time 336 μs <
Memory estimate: 8.00 KiB, allocs estimate: 1.
Results for two substitutions (factors of Achol
are Upper
/LowerTriangular
).
julia> norm(Achol.U \ (Achol.L \ b) - Achol \ b, Inf)
1.7053025658242404e-12
julia> @benchmark U \ (L \ $b) setup=(L=$Achol.L; U=$Achol.U)
BenchmarkTools.Trial: 4178 samples with 1 evaluation.
Range (min … max): 189.958 μs … 341.667 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 204.875 μs ┊ GC (median): 0.00%
Time (mean ± σ): 205.505 μs ± 7.942 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▅▇▆██▆▇▇▅▄▆▅▄▄▄▄▆▄▂▆▃▁▅▅▁▃▅▂▃▂▂▄▁▁
▂▁▂▃▂▃▃▄▆▆████████████████████████████████████▆▆▄▄▃▃▂▃▁▂▁▂▁▂▁ ▅
190 μs Histogram: frequency by time 224 μs <
Memory estimate: 16.00 KiB, allocs estimate: 2.
Performance of the above example is close to ldiv
for LU
.
julia> Alu = lu(A);
julia> @benchmark $Alu \ $b
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 176.708 μs … 451.208 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 178.750 μs ┊ GC (median): 0.00%
Time (mean ± σ): 188.390 μs ± 32.983 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▅▂ ▃▃▃ ▁
████▇█▇▆▄▃▃▁▁▃▁▁███▇▃▃▅▁▄▃▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▆▆▃▅▆▇ █
177 μs Histogram: log(frequency) by time 400 μs <
Memory estimate: 8.00 KiB, allocs estimate: 1.
The examples above I run on my MacBook. Here is corresponding versioninfo
.
julia> versioninfo()
Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 10 × Apple M2 Pro
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 6 virtual cores)
But, I’ve faced similar situation with the following setups (can’t provide versioninfo
for them)
- Julia 1.10, Ubuntu 22.04, build for Generic Linux on x86 (so OpenBLAS should be under the hood)
- Julia 1.11 on Windows 10, 64-bit installer
I expect that Achol \ b
should be faster than Alu \ b
. Do I measuring performance wrong or Achol \ b
is buggy?