potrf! is probably exiting early due to the mutated matrix no longer being pd and is returning an error code.
Meanwhile, cholesky is not in place. It performs a symmetry check and then successfully factorizes.
I’ve used the @time macro. potrf! is as fast as cholesky, I’ve expected potrf! to be faster. In your opinion is cholesky function as fast as possible or are there any opportunities to speed it up?
Maybe - since you’re using the non-mutating version, there’s some overhead for creating the new array. See this example with larger sizes, to see how it scales:
I’ve used evals=1 here to prevent the same array being used twice.
There also seems to be some redundant checks in that specific code path, so there might be some gains there as well (checking for squareness twice, for example…).
Although the performance is close after setting them to both use 1 thread per physical core. OpenBLAS defaulted to only using 8 on my 18 core computer. Most of MKL’s advantage seemed to be in using more cores by default.
BTW, note that you need to set evals=1 otherwise your mutating potrf! will be called multiple times per setup, which could skew your results. See How to benchmark append!? - #7 by rdeits and the other conversations linked from there.
You do realize that Julia’s cholesky function calls LAPACK potrf! (for Matrix{Float64} types), right? They are not separate implementations.
Once you fix the bugs your benchmark code, you’re mainly just measuring the overhead of allocating a new array versus working in-place (same as cholesky vs. cholesky!).