Hermitian version of three-argument dot

Is there a function to compute directly v^* A v for a vector v and a matrix A?

Ideally, it should have the upsides of three-argument dot, like not allocating Av, and take advantage of symmetry, so when A is Symmetric or Hermitian and v is complex it should return a real value rather than a complex one.

v = randn(ComplexF64, 2, 1)
A = Symmetric(randn(2,2))
isreal(dot(v, A, v))   # often returns false

Not that I know of, besides real(dot(v, A, v)) of course.

One option is to have dot(x, A, y) take a special branch if x === y (i.e. if they are the same object, which can be checked quickly), although of course it would still need to return a complex number (with zero imaginary part) for type stability when the vectors are complex. Since this is just an optimization, it could be added without much fanfare.

Another option is to have norm(x, A) return the A-norm \Vert x \Vert_A = \sqrt{x^* A x} for hermitian A (assumed positive-definite — it could throw a DomainError if A was indefinite whenever x^* A x < 0, since it’s too expensive to check positive-definiteness whenever you compute the norm).

1 Like

Thanks, that makes sense.

Another option would be adding norm2 to compute the square norm (we already have abs2, and norm2 could be implemented in a more performant way than norm since one doesn’t need to check for overflow). Then x'*A*x could simply be a method norm2(x, A). Has this function been considered before?

Another (but slower) option would be to get more pedantic:

using IntervalArithmetic

intv = Interval.(v)

intd = dot(intv,A,intv)

contains_real(v) = contains_zero(imag(v))

Now,

julia> contains_real(intd)
true

julia> intd
[-0.414165, -0.414164] + [-8.32668e-17, 1.11023e-16]im

contains_real reliably returns true. Rounding towards a real can also be managed.

Yes, though for an ordinary L2 norm squared you can simply do sum(abs2, x).

As far as I understand, sum(abs2, x) doesn’t scale x prior to summing its components in order to avoid overflow in the way norm() does. Am I wrong here?

Also I would say that the computation of x’ * A * x is relevant also for symmetric/Hermitian indefinite A, not just for positive definite A.

Right, sum(abs2, x) does no scaling. Nor is it useful to do any scaling when you are computing the squared norm.

It’s only if you are computing the non-squared norm, i.e. \sqrt{\sum |x_k|^2}, that you want to use norm(x) instead of sqrt(sum(abs2, x)) to avoid spurious over/underflow.

1 Like

True.

Is there a algorithmic advantage to using Hermitian/Symmetric structure in dot(v,A,v)? It seems like one still needs N^2+N mul-adds regardless of the symmetry of A. Or is there an optimization I’m missing here?

EDIT: the below post has shown that there is a 2x algorithmic advantage to some symmetry structures.

If you had some nice factorization of A, like cholesky, then you could save roughly half the computations. But this savings isn’t worth the cost of cholesky up-front. In the complex case, I suppose you could avoid computing the imaginary part of the second multiply to drop the cost from 4N^2+4N to 4N^2 + 2N real flops, but that would seem to be a marginal savings even at small sizes (24 vs 20 real flops at N=2 and more negligible at larger).

If what you care about is that the result is real because (structurally) it should be, consider just calling real(dot(v,A,v)).

Yes, you theoretically save about a factor of 2 because you only need to look at the upper triangle of A.

Let’s suppose that all the numbers are complex and you exploit no structure. Our dot(v,A,v) algorithm essentially computes the most straightforward:

\sum_{i,j} \overline{v_i} A_{ij} v_j = \sum_i \overline{v_i} \left( \sum_j A_{ij} v_j \right)

which requires N^2 + N complex multiplications (6 flops each) and N(N - 1) + (N-1) = N^2 - 1 complex additions (2 flops each), for \approx 8N^2 flops. You can save O(N) flops by some tricks for v === v, e.g. special handling for the diagonal elements where you have a complex–real multiplication and a norm instead of two general complex–complex multiplies, but the leading term is still 8N^2. (This is just a naive flop count, of course, not counting FMA or SIMD.)

For Hermitian A, however, things simplify to

\sum_{i=1}^N |v_i|^2 A_{ii} + 2 \sum_{i=1}^N \operatorname{Re}\left(\overline{v_i} \left[ \sum_{j=i+1}^N A_{ij} v_j \right]\right)

which costs N abs2 computations (3 flops each) + N real multiplies + N-1 real additions (1 flop each) for the first sum, along with N(N-1)/2 general complex multiplies (6 flops each) + N complex multiplies where you only compute the real part (3 flops each) + N(N-1)/2 - N complex additions (2 flops each) + N-1 real additions (1 flop each) for the second sum, + 1 real add and 1 real multiply (2 flops) to put it all together. The leading term is 3N^2 + N^2 = 4N^2 flops, asymptotically saving a factor of 2.

If all of the numbers are real, then the savings in flops are still asymptotically a factor of 2 (the leading term goes from \approx 2N^2 flops to \approx N^2 flops). Of course, for large matrices these BLAS2-type operations are typically memory-bound and not compute-bound, but there should still be memory-bandwidth savings from only looking at half of A.

3 Likes

Thanks. So the short version is that for Hermitian A you can compute something algorithmically like 2*real(dot(v,triu(A,1),v)) + dot(v,Diagonal(A),v), which does indeed induce the sort of sparsity one can exploit to shrink the leading coefficient on the flop count.

These exploits appear to require that x == y (in dot(x,A,y)), so it’s quite a pun to call this dot any more. As someone pointed out above, it’d better be described as a mode of norm.

EDIT: as pointed out below, this is merely a performance enhancement so isn’t “wrong” to include it in dot when we can observe that the vector arguments are ===. For type stability, the result would still be complex-typed for complex arguments but we could ensure the imaginary part is zero. Although === is cheap for Array, it would have more significant consequences if this method ended up applying to bitstypes like StaticArrays.SVector.

Right, but that being said there’s nothing wrong with writing a method of dot(x, A::Hermitian, y) that checks x === y (not x == y, which is slower) and goes to an optimized implementation for that case.

4 Likes