Why the time consumed of LAPACK.potrf! increase hugely when the dimension is bigger than a special number

As @skleinbo suggested, you shouldn’t use _chol!. I played around with this a little bit. On my machine, cholesky (not cholesky!) from StaticArraysjl is about 20% faster than the naive code (revised) for n\leq 24. For n>24, StaticArrays.jl appears to call LAPACK with some apparent overhead that makes it the slowest of the three. LAPACK starts to win for n>110 or so. So what is best is fairly size dependent.

If it makes sense to use it (feel free), I did make a few refinements to my simple code so it’s a little more robust and a little faster:

function basic_cholesky!(H::Hermitian{T,<:AbstractMatrix{T}}) where {T}
  A = H.data
  H.uplo == 'U' ||
    error("Hermitian matrix should be in upper triangular storage.")
  @inbounds for i in axes(A, 1)
    x = zero(T)
    @simd for k = first(axes(A, 1)):(i - 1)
      x += conj(A[k, i]) * A[k, i]
    end
    s = H[i, i] - x # Use H[i,i] since it will be real.
    real(s) > 0 || throw(PosDefException(i))
    A[i, i] = sqrt(s)
    for j = (i + 1):last(axes(A, 2))
      x = zero(T)
      @simd for k = first(axes(A, 1)):(i - 1)
        x += conj(A[k, i]) * A[k, j]
      end
      A[i, j] = (A[i, j] - x) / A[i, i]
    end
  end
  UpperTriangular(A)
end

Still pretty basic and not solidly tested, but perhaps a little better. (No warranty.) It would be nice to index into H directly in most places, so it wouldn’t matter whether upper or lower triangular storage was used, but that was a bit slower.

2 Likes