`isposdef` inconsistent results, numerically unstable?

I think you misunderstand my argument, I did not say that isposdef is unnecessary. That said, I think that bkfact is much more useful for symmetric matrices, as you can make a decision about conditioning after the structure is revealed. In my (admittedly limited) experience, you rarely ever need to know that something is PD without a factorization.

The point is that by the time you can check PD, you have pretty much solved the system. If you disagree, I would be interested to hear how you propose to check PD; all the methods I am aware of require a factorization.

I am still curious about examples where you (1) rely on PD for non-symmetric matrices, (2) have a cheaper way of checking this than a factorization, (3) you don’t actually need the factorization.

isposdef is not the most efficient way to do its job, especially considering the big picture but it is the most convenient. I am arguing for the correctness of the behavior of isposdef and you are arguing against the efficiency of using isposdef to do what it is designed for, and I think both of us are correct.

FWIW, I think the most efficient way to check positive definiteness of a matrix will involve an incomplete factorization of some sort, this can cut down the work by a significant factor, especially if the rank is revealed at an early stage of the algorithm. I am thinking SVD for example which according to Golub and Van Loan’s requires 4mn^2 - 4n^3/3 flops if you only want the singular values and requires 14mn^2+8n^3 flops for fullest SVD, and there are many other combinations. The numbers may be different for more modern algorithms and sparse matrices but this is the idea. Again I am not arguing for the efficiency of using isposdef in this thread, but your question got me curious.

Edit: An incomplete Cholesky is probably a more appropriate example, anyway this is somewhat derailing :slight_smile:

https://github.com/JuliaLang/julia/issues/26045

I am not quite sure what the singular values will learn you about the definiteness of your matrix, as it doesn’t reveal information about the (sign of the) eigenvalues.

The relation between positive definiteness and symmetry comes from the fact that positive definitess of a matrix A can be defined without referring to its eigenvalues as always producing a positive number v'*A*v for any vector v. In this definition, only a symmetric A makes sense, as the anti-symmetric part does not contribute to the value v'*A*v. So that definition actually requires that (A+A')/2 has positive eigenvalues, or equivalently, that (A+A')/2 has a Cholesky decomposition (which is computationally cheaper to check). I think it is better to check for symmetry explicitly, as it might be very confusing if isposdef first takes the symmetric/hermitian part itself, as it could then return true for matrices whose eigenvalues are not even positive or real.

Vice versa, when using a definition where positive definiteness only requires that all eigenvalues of A are positive, such a matrix A might lead to ``v’Av < 0. And furthermore, that definition cannot be checked in a cheaper fashion then by computing the eigenvalue spectrum. So I am not sure whether there is a real need for a convenience method for that definition, as it just amounts to all(x->x>0, eigvals(A))`.

2 Likes

True, but I was merely giving an example of incomplete factorizations. And in fact, in my use cases, I am using isposdef as practically !issingular which is obviously not the case in general, but I did not intend for my example to be interpreted beyond being a demo for incomplete factorizations, sorry if that wasn’t clear.

The anti-symmetric part doesn’t contribute, yes, but the definition still makes sense for unsymmetric matrices.

May you elaborate? When I say positive definite, I am referring to the quadratic form being positive, not any other equivalence for more restricted classes. And I think the function will only be confusing if 1) it is not well documented, or 2) the person using it doesn’t know what he/she is asking for. The first is not hard to deal with and the second is on the user, but even the second can be made less likely to harm if the function had options with safe defaults.

I see two possible relaxed definitions for positive definiteness:
(1) A is positive definite if v'*A*v>0 for all non-zero vectors v
(2) A is positive definite if all eigenvalues of A are positive

In definition 1, the only guarantee is that the spectrum of (A+A')/2 is positive, not necessarily the spectrum of A itself.
In definition 2, it is not because the spectrum of A is positive, that the spectrum of (A+A')/2 is positive, and therefore, it is not implied that v'*A*v>0 for all non-zero vectors v.

Hence for general (non-Hermitian) A, these two definitions are mutually incompatible, either one is not contained in the other. The only way to make them compatible is to additionally impose that A is hermitian, in which case both definitions collapse.

For definition (1), that is a very natural restriction since it’s anyway only the hermitian part of A that contributes in this quadratic form. Say, from physics perspective, if v'*A*v is the energy or Hamiltonian of your system, than the non-hermitian part of A would not be physical and not appear in any equation or motion.

For definition (2), additionally imposing that A is hermitian gives a way of checking more easily the condition. The eigenvalues of a hermitian matrix are more stable and can more efficiently be computed, and in fact, for just checking that they are positive, one can resort to a Cholesky factorization without having to actually compute any eigenvalues. If on the other hand A is non-Hermitian, then using the eigenvalues as a definition is much more subtle. How do you deal with Jordan blocks etc.

1 Like

Thanks for the elaboration, a very nice summary. In Golub and Van Loan’s, (1) is the definition and (2) is implied from {(1) \land A is symmetric}. I agree that it can get confusing, but I don’t think the solution is to overlook it altogether. Are you aware of any established reference that “defines” positive definiteness by (2)? If so, this would make 3 definitions!

In fact, (what I should have known from the start), if in v'*A*v>0 you include complex vectors v, even if A is just real, this condition automatically requires that A is Hermitian (and thus symmetric if it is real). The antisymmetric part of A does not just drop out if v is complex, but give rise to an imaginary part in the value of v'*A*v.

So I really think the only sensible definition of positive definiteness is a Hermitian matrix A with positive eigenvalues, and there is no way to allow for non-symmetric matrices.

With all due respect @juthohaegeman, but please provide references for any “definition” you use since it is obvious none of the people discussing this topic here are defining this old concept of positive definiteness, and we are merely quoting definitions from others who taught us or who we read for.

Quoting Golub and Van Loan: “A matrix A \in \mathbb{R}^{n \times n} is positive definite if x^TAx > 0 for all nonzero x \in \mathbb{R}^n.” So for real matrices, complex x is not included in this definition.

I tried looking up a definition for complex matrices and one of the best (subjectivity alarm) discussions I found was in Wolfram Positive Definite Matrix -- from Wolfram MathWorld which discusses a generalized definition that only takes the real part of the quadratic form, and has a number of references of which I could only double check Johnson 1970 and it mostly discusses real matrices. Quoting from Johnson 1970 though: “If we allow complex-entried matrices and then define \Sigma_n^H to be the Hermitian matrices which are positive definite, it is clear also that our results concerning \Sigma_n may be modified to remain valid in \Sigma_n^H. In addition we should note that our notion of \Pi_n has no consistent analog in the complex-entried matrices. For instance, a matrix A \in \Pi_n with respect to real vectors may not even have x^*Ax real when complex vectors are allowed (* means “conjugate transpose”).”

In the above text, \Pi_n refers to matrices satisfying this definition, quoting the same text: “An n \times n real matrix A, where n is a positive integer, is called positive definite if (x, Ax) = x^TAx > 0 for all nonzero column vectors x in Euclidean n-dimensional space.” And \Sigma_n refers to symmetric matrices \in \Pi_n.

\Pi_n is clearly limited to real vectors x, hence the text says that there is no consistent notion for complex matrices. Wolfram however seems to use a generalized definition for complex matrices which may have been copied from one of the remaining references there.

I admit that I wasn’t thinking of complex matrices when I brought this topic up, and I am not exactly well informed about the exact definition of positive definiteness for complex matrices, but I don’t like making up new definitions or claiming something doesn’t make sense based on wrong or debatable assumptions.

1 Like

A related concept is “A is stable if all eigenvalues have positive real part”. This one is pretty important but should have a different name, since it is conceptually different: It is a concept for linear maps, whereas positive definiteness is for bilinear forms (the word “eigenvalue” makes no sense for bilinear forms! What isposdef does for symmetric bilinear forms is checking the morse index / moments of inertia). Alas, we don’t encode co-/contra-variance in arrays; so a matrix can be either a linear map or a bilinear form, and we have no way of knowing what the user means (and we should assume that many users don’t know what they mean either).

What this imho boils down to: isposdef has no business thinking about non-hermitian matrices; either it should symmetrize before checking positive definiteness, or throw on encountering a non-hermitian marix, or assume symmetry and produce UB on encountering non-hermtian matrices (without paying the cycles for checking). This should be documented unambiguously.

1 Like

I completely agree with @mohamed82008 in this topic. IMO we have two options to resolve the conflict.

  1. We restrict the domain of the function isposdef to symmetric real - and Hermitian matrices, that mean the method called for general real or complex matrices should throw MethodError, DomainError or ArgumentError.

  2. We extend the domain of isposdef to general matrices in the way Mathematica is doing it. That means, in the real and complex case we have to check A + A' for positive definiteness in the restricted sense.

I am in favour of option 2.

1 Like

There is also the much simpler option to document that Julia uses the common definition of positive definite that means the matrix must be symmetric. This would require a few words to the docs and then this is done, also no need to wrap in Symmetric. If someone wants special extension to their positive definiteness, they can put that in a package.

1 Like

That would be simpler, I agree: don’t change the feature (also if it is arguable wrong), and document it. I thought there was an agreement, that the current situation is in error.

BTW ‘Mathematica’ would be a good reference for ’ Julia’ in general.

1 Like

That would be the definition of positive definiteness in the more general context of operators in Hilbert spaces (e.g. Reed & Simon, many books, https://www.math.ksu.edu/~nagy/real-an/2-07-op-th.pdf), and it is my quantum physics background that made me assume that the extension of a linear algebra definition to the complex domain was rather natural and could be taken for granted. Even when A is real, it is not clear from the definition that x should be restricted to real (which is not the case of you’re doing quantum physics) and for complex x, A symmetric follows from the definition.

I would be interested to learn where other definitions, like Wolfram’s real(x'*A*x)>0 would show up and are useful. However, Wolfram’s conclusion seems wrong. An A that satisfies this definition does not necessarily have a Cholesky decomposition, if at least that is taken as A=U'*U. Furthermore, it does not seem that Wolfram provides a reference for their definition (though there are references for some statements in the rest of the discussion, but not for the definition itself).

I will try to make some time to read this in details. However, I don’t think it can prove the other (older) linear algebra references wrong which I believe a wider audience of computational scientists would be more familiar with than with operator theory in Hilbert spaces. I will also try to check the other books in Wolfram to see if they took their definition from one of these books or not. This is turning out to be more troublesome than I anticipated!

Positive-definite non-Hermitian matrices arise in discretizations of convection-diffusion operators. (Whether these are real or complex is just a coordinate-vs-wavenumber basis choice.) This kind of definiteness bears on what sorts of iterative methods are appropriate for large problems. (See various papers by ZZ Bai et al.)

It is possible that they occur in generalized saddle-point problems arising in optimization. They might even have some utility in non-Hermitian quantum mechanics (but is that itself actually useful?). I’m just a bemused spectator in these domains, and didn’t find anything compelling on Google.

The author of the Wolfram page highlights the likelihood of confusion and, as you say, got confused.

Concerning the isposdef function, I think most of us agree that making the documentation precise is most important.

So the question appears to be: What do we do when encountering a non-hermitian matrix.
I have seen here and on github the options:

  1. check; return false
  2. check; throw.
  3. Never come into this situation: Only accept hermitian matrix types.
  4. Don’t check, compute. Result is formally UB on non-hermitian matrices (but reasonable most of the time, and the set of isposdef matrices is open). Does the backend actually support this?
  5. check; compute A+transpose_conjugate(A); work on that (or do something that gives equivalent output).
  6. Always use the upper (resp. lower) triangular part.

Imho (1) makes no sense, (2) is not free; how expensive is the check compared to the computation? (3) is quite restrictive and someone should tell us how much cache-fiendlyness this loses, (4) makes some sense for users who know what they are doing (5) makes sense; but I’d guess that the user should say explicitly that he wants this; (6) only makes sense as a possible implementation of (4) and only if it is faster than (5).

All in all I think that I am for (2) as default; and, if this gives a possible speedup over explicit handing by the user, options for (4) and (5). (6) can be achieved by wrapping in a symmetric type.

As various people said, (5) makes a lot of sense in many situations and is for real matrices and vectors equivalent to v'*A*v>0. (the implementation is not actually required to compute A+A'). (4) is what you probably want if your matrix is symmetric up to very small float-errors (only if it gives speedup over (5)).

To summarize (please clearly state what you are arguing for):

I argue for (2).
klacru argues for (5)
mohamed82008 argues for (5).
kristoffer.carlsson appears to argue for (1) (?).
juthohaegeman has not clarified yet what he argues for (1?).
Tamas_Papp thinks the discussion is silly.
Ralpha_Smith prefers (5), but thinks (1) is also acceptable if properly documented.
antoine-levitt argues for (3) (on github)
cortner argues for (5).

Now, why are so many people (including me) so strongly against (1)? Fundamentally, most yes/no questions allow the answers {yes | no | it’s complicated | [don’t terminate]}.

Ask the converse: “isindefinite”. What are the reasonable answers to the question “is a small perturbation of the identity indefinite”?

Is anyone here arguing that such a matrix should be considered “indefinite”? I think not. Hence, when asking whether it is positive definite, we can only reasonably answer “it’s complicated” or “yes”.

I think I prefer a version of “it’s complicated” in order to protect people who have not thought about what they actually want to check.

edits: keeping the list of people up-to-date.

I argue for 5. Anything that returns a silent wrong result is in my opinion unacceptable. This includes 1 (by some definitions for real matrices), 4 and 6. If 6 is implemented using 3, then it’s fine. 2 is weird to be honest, since if I am asking Julia a question, I don’t expect it to punch me in the face if the answer is no. 3 would enforce the restricted definition in a more obvious way, and Kristoffer’s solution of simply documenting the current function, and doing no more, is just a less obvious way of enforcing the restricted definition, since the user has to call help to actually know what definition is used, and cannot just count on his/her knowledge of what isposdef means when using it. It appears documenting the behavior is a must anyways, so I think the real argument here is between 1, 3 and 5. I believe Jutho argues that 5 doesn’t make sense for complex matrices, so a clear definition for positive definiteness of non-Hermitian matrices is still up in the air, since all references discussed here either contradict or purposefully avoid talking about complex matrices, and the definition for positive definiteness which includes unsymmetric real matrices is not straight forward to generalize to complex matrices. I am still in favor of 5 though for real matrices at least. For complex, I will need to double check the definitions out there to see which one naturally specializes to the real matrices definition, if any. Thanks for the summary @foobar_lv2.

Edit: Actually 3 is also kind of weird, since it doesn’t stick to any one definition. So by one definition, some users will expect isposdef to return true, and by another definition other users will expect it to return false, but instead it gives an error, which makes it similar to 2.

Edit2: But I doubt this is the first time people have thought about a definition for isposdef in a programming language, so conclusions reached by other languages might also be a good reference as klacru says.

(3) means: isposdef has no method for general matrices. Feeding a matrix in gives a method error, regardless of whether the matrix happens to be symmetric / hermitian. You need to wrap your matrix in a Symmetric. (2) basically allows users to forget the wrapper, but warns them when it makes a difference. (1) silently returns false if the user forgot the wrapper and it would have made a difference.

Mathematically, isposdef should have the nice property of being an open property. Either because the domain is restricted to symmetric matrices, or because an extension to non-symmetric matrices has been chosen.

This is the thing that tripped you up: You thought you were checking for an open thing, but instead isposdef first checked whether you are exactly on the subspace of hermitian matrices (and checking floats for identity is always problematic; e.g. rank needs a tolerance).

1 Like