Hi everyone!
I’ve been a Julia user for several years, but this is my first time posting here, so apologies if this has been asked somewhere before. I was curious about Julia’s implementation of Cholesky factorization — specifically accessing and using the factors of the Cholesky data structure.
Say I have some (positive definite) matrix A = L L^T, I perform the Cholesky factorization as
C = cholesky(A)
, and I want to use L for some calculations (e.g., a matrix-vector multiplication). I realized recently that, given some vector v
,
C.L*v
is quite a bit slower than
C.U'*v
because C.L
actually forms a copy of the underlying factors
matrix every time it’s called (and because storage of the factor in the upper part is the default behavior). The relevant code in cholesky.jl
is:
if d === :U
return UpperTriangular(Cuplo === char_uplo(d) ? Cfactors : copy(Cfactors'))
elseif d === :L
return LowerTriangular(Cuplo === char_uplo(d) ? Cfactors : copy(Cfactors'))
elseif d === :UL
return (Cuplo === 'U' ? UpperTriangular(Cfactors) : LowerTriangular(Cfactors))
else
I could imagine that many a user might naively call C.L
without realizing the significant performance hit, so I’m curious: why not return something like LowerTriangular(Cfactors')
instead of LowerTriangular(copy(Cfactors'))
(and similarly for C.U
if the factor is stored in the lower part)? Of course, there’s a difference in meaning by returning an adjoint, and maybe something would break (I can’t think of anything at the moment) for an AbstractMatrix
data structure that’s not just a simple array of floats. But for most use cases (e.g., multiplication, solves, changing coefficients, accessing coefficients, etc.), I would imagine this is advantageous. Are there some deep reasons why this choice of copying was made?
Thanks !