As @skleinbo suggested, you shouldn’t use _chol!. I played around with this a little bit. On my machine, cholesky (not cholesky!) from StaticArraysjl is about 20% faster than the naive code (revised) for n\leq 24. For n>24, StaticArrays.jl appears to call LAPACK with some apparent overhead that makes it the slowest of the three. LAPACK starts to win for n>110 or so. So what is best is fairly size dependent.
If it makes sense to use it (feel free), I did make a few refinements to my simple code so it’s a little more robust and a little faster:
function basic_cholesky!(H::Hermitian{T,<:AbstractMatrix{T}}) where {T}
A = H.data
H.uplo == 'U' ||
error("Hermitian matrix should be in upper triangular storage.")
@inbounds for i in axes(A, 1)
x = zero(T)
@simd for k = first(axes(A, 1)):(i - 1)
x += conj(A[k, i]) * A[k, i]
end
s = H[i, i] - x # Use H[i,i] since it will be real.
real(s) > 0 || throw(PosDefException(i))
A[i, i] = sqrt(s)
for j = (i + 1):last(axes(A, 2))
x = zero(T)
@simd for k = first(axes(A, 1)):(i - 1)
x += conj(A[k, i]) * A[k, j]
end
A[i, j] = (A[i, j] - x) / A[i, i]
end
end
UpperTriangular(A)
end
Still pretty basic and not solidly tested, but perhaps a little better. (No warranty.) It would be nice to index into H directly in most places, so it wouldn’t matter whether upper or lower triangular storage was used, but that was a bit slower.