I think the answer will depend on how rank-deficient the matrix is (and its size), at least in the dense case. Is P reasonably close to full rank, or do you expect it to be severely degenerate?
If it is full rank, then I would claim that the fastest thing to do for (1) is sum(abs2, cholesky(P).L'*x). And if the matrix is low enough rank, then I’d be tempted to compute a very high precision low rank approximation and then put that in. But again, there’s obviously overhead with something like that and so things like the size of P and how degenerate it is will presumably be important.
EDIT: sorry, I always compute x^T P^{-1} x, not the multiplication, so my solution for (1) was an autopilot thing and not going to be best for multiplications. And I even forgot to put the transpose in. Not my most precise comment, sorry.
I would just use dot(x, P, x) (to compute x^* P x; I’m not sure if you care whether it is conjugated in the complex case?), and worry about further micro-optimization only if benchmarks/profiling justifies the effort. This should be reasonably efficient in all cases even if it is not necessarily “optimal”.
You may assume it is something I found I need to optimize as much as possible.
As far as I know, the dot() won’t take advantage of mP being a symmetric matrix.
Is there an implementation which at least do that and gain some speed?
Here’s an example where a matrix that is sufficiently large and rank-deficient benefits from something fancier:
using KernelMatrices, BenchmarkTools
const M = randn(50, 10_000);
const A = M'M;
const x = randn(10_000)
function fact_method(A,x)
(U,V) = KernelMatrices.ACA(A, 1e-14)
dot(U'*x, V'*x)
end
@btime dot($x, $A, $x) # 27.1 ms on my machine w/ 4 LAPACK threads
@btime fact_method($A, $x) # 27.2 ms with same setup.
I picked a rank that is about the break-even speed, but if you make it lower than 50 then the fact_method wins.
This factorization is the adaptive cross approximation, which I think is reasonably comparable to an LU factorization with a fancy pivoting strategy, but which chooses the pivots based on partial information to avoid passing over the whole matrix. I’m not super knowledgeable about the lowest level details about pivoting strategies, though, so that might be incorrect. In any case, I implemented it here a long time ago. I don’t think it’s embarrassingly slow, but that package was one of my first big Julia projects and so while I haven’t seen anything better I bet somebody like @stevengj could sit down and write something much faster in the timespan of a coffee break.
I bring all this up to say that the code isn’t great and the factorization as currently implemented doesn’t specialize for symmetry, but even still for non-ridiculous matrix conditions it does beat the naive dot(x, P, x). I would bet that a smarter implementation could be even more competitive. So I do think that the answer will really depend on some parameters of the problem like size and rank.
Obviously I’m not suggesting that you form the large rank-deficient square matrix and then re-factorize it. My point in that post is that the cost of the factorization might be worth it in some circumstances if the matrix is large enough rank deficient enough, and the matrix A is designed to be an example of that.
@RoyiAvital, can you comment on any more specifics for your problem, like the size and rank? I think without more specific information it’s hard to give particularly productive advice.
What about the case the rank is high to make those method not feasible.
Is there any function which can take advantage of the structure of the problem?
I appreciate the shout-out, although it’s really just the ACA that would be relevant for forming efficient low-rank representations like A = U V^T, and I have a feeling most people here could write a better ACA than what’s in there. So I would really only suggest anybody use that functionality as a cheap way to test for a proof of concept or something.
In any case, I don’t really have any ideas or knowledge about more thoughtful methods than the three-argument dot, at least at this level of generality and in a setting where factorizations aren’t helpful. But I’ll definitely be watching this thread, because I also find the topic interesting and post here in part because I’d love to see faster methods than the ones I currently use…
I think it also depends if you need to do the calculation only once, or if you need to repeat it for a lot of different vectors x but fixed P. In the latter case paying the price to calculate a reduced rank approximation could be worth doing.