Ldiv for Cholesky is slower than two substitutions

To elaborate on what @stevengj said: You’re pulling the most costly part of the computation out into the setup here, so you’re only measuring a small part of the actual cost of the operation. In a fair benchmark, Achol \ b wins handily. Here’s what I find:

julia> @benchmark $Achol \ $b
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  248.917 μs … 388.625 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     261.416 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   263.493 μs ±   9.988 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▁▂   ▃██▄▂▃▃▃▂▁
  ▃██▇▆▆████████████▆▇▇▇▇▆▆▆▇▆▇▇▇▇▇▇▇▇▇▆▅▅▅▆▅▅▄▄▄▄▃▃▂▂▂▃▂▂▂▂▂▂▂ ▄
  249 μs           Histogram: frequency by time          289 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 1.

julia> @benchmark U \ (L \ $b) setup=(U = $Achol.U; L = $Achol.L)  # Unfair
BenchmarkTools.Trial: 5858 samples with 1 evaluation per sample.
 Range (min … max):  132.750 μs … 227.459 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     145.292 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   145.427 μs ±   5.723 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

             ▁▁▄▄▄▄▄▄▃▂▄▄▄▄▆▅█▆▅▄▆▄▄▅▅▂▃
  ▁▁▁▁▂▃▄▅▄▇▇███████████████████████████▇▇▅▆▄▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁ ▅
  133 μs           Histogram: frequency by time          162 μs <

 Memory estimate: 16.00 KiB, allocs estimate: 2.

julia> myldiv(chol, x) = chol.U \ (chol.L \ x)
myldiv (generic function with 1 method)

julia> @benchmark myldiv($Achol, $b)  # Fair, but slower
BenchmarkTools.Trial: 5796 samples with 1 evaluation per sample.
 Range (min … max):  659.125 μs …   2.103 ms  ┊ GC (min … max): 0.00% … 57.96%
 Time  (median):     765.250 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   862.006 μs ± 209.153 μs  ┊ GC (mean ± σ):  9.99% ± 14.68%

    ▁▂▂▆██▆▄▄▄▄▃▂▂▂▁▁                   ▁▁▂▂▃▃▃▂▂▂▁▁▁           ▂
  ▆▆█████████████████▇▆▇▅▁▄▄▃▃▁▁▁▁▁▁▁▁▄▆████████████████▇█▆▆▆▅▄ █
  659 μs        Histogram: log(frequency) by time        1.5 ms <

 Memory estimate: 7.65 MiB, allocs estimate: 4.