For some model indentification problems, I have to compute the grammian of data matrices i.e. compute A = X'*X where X is a complex rectangular matrix.
Is there a fast implementation of this specific operation in julia knowing that the output matrix is hermitian ?
A first basic assessment would assume this could split in half the computation times as well as the memory requirements. Also this could reduce numerical error propagation leading to a non heremitian numeric output.
Just a remark: it is often advantageous from a numerical viewpoint to use algorithms that do not require forming the gramian and instead work with its square root (Cholesky) factor.
Typically you should look at the minimum time rather than the median, since timing noise is always positive. (I usually use @btime, which only reports the minimum.) When you do that, the conclusion is the opposite on my computer:
It looks like the culprit is threading ā it seems that dgemm is better-parallelized than dysrk in OpenBLAS, probably because the computation in dgemm is more regular. If I change to single-threaded BLAS then dsyrk is faster in both cases:
To step back a little bit, do you actually need to compute that matrix? If you just need a few entries or to apply it to vectors or something, something like this pseudo-julia might be helpful:
struct GramMatrix{T}
X::Matrix{T} # Or perhaps a QR factorization of X?
end
Base.getindex(G::GramMatrix{T}, j::Int64, k::Int64) where{T} = # small inner product of j/k slices
Base.*(G::GramMatrix{T}, x::Vector{T}) where{T} = # manual two-part matrix multiplication
Base.\(G::GramMatrix{T}, x::Vector{T}) where{T} = ... #, perhaps using IterativeSolvers.jl with your fast matvec
# ....
Of course if you need the whole matrix then this wonāt be enough, but particularly for very rectangular X you could potentially stand to save a lot with just a few lines of code like this if you only need something simple.
Actually, I am computing some reduced normal equations from covariance and cross covariance matrices.
So I have some CovX = X'*X,CrcovXY = X'*Y and CovY = Y'*Y matrices, which are then assembled as Z = CovY - CrcovXY'*CovX*CrcovY.
Of course, this gets me in much numerical trouble and there seems to be a lot of error propagation.
For instance, though Z should theoretically be Hermitian, it is numerically not.
I read the articles you mention a bit more thoroughly. It seems that the available algorithms to perform the computation of the Cholesky decomposition of the Gramian matrix require to have a realization of a system (whether in continuous or discrete time). Because of that, I do not see how to take advantage of this (yet promising otherwise) approach.