Ldiv for Cholesky is slower than two substitutions

Turns out it’s faster to avoid the extra preparation anyway. You can use an adjoint wrapper instead of materializing L, and the resulting computations are even faster:

julia> function myldiv(chol, x)
           if chol.uplo == 'U'
               return ldiv!(chol.U, chol.U' \ x)
           elseif chol.uplo == 'L'
               return ldiv!(chol.L', chol.L \ x)
           else
               error("unreachable")
           end
       end
myldiv (generic function with 1 method)

julia> @benchmark myldiv($Achol, $b)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  120.500 μs … 276.959 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     129.959 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   130.370 μs ±   8.761 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▁                 █
  ██▆▄▅▅▅▄▄▄▃▃▃▃▃▃▃▄▄██▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂ ▃
  120 μs           Histogram: frequency by time          166 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

Compare this to my timings above for the U \ (L \ $b) version on the same computer:

 Range (min … max):  132.750 μs … 227.459 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     145.292 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   145.427 μs ±   5.723 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

             ▁▁▄▄▄▄▄▄▃▂▄▄▄▄▆▅█▆▅▄▆▄▄▅▅▂▃
  ▁▁▁▁▂▃▄▅▄▇▇███████████████████████████▇▇▅▆▄▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁ ▅
  133 μs           Histogram: frequency by time          162 μs <

 Memory estimate: 16.00 KiB, allocs estimate: 2.

(Note that I used ldiv! to avoid an extra allocation in the new version, but that doesn’t significantly affect the timings, so the comparison is still valid. The O(N) allocation cost is dwarfed by the O(N^2) computation cost at this size.)

1 Like