I wrote a function to rotate the coordinates of a 2D MvNormal distribution, and in the course of testing it I discovered this, surprising to me, behavior of the Cholesky factorization:
It looks like this is not yet tracked in GitHub, though. I’m trying to figure out whether to report this there, rather than here on the Discourse page.
Thanks, @oxinabox and @PetrKryslUCSD. Understood that the matrix is asymmetric by one bit, but I think the expectation is that the standard numerics in Julia should be robust to changes in that bit. I’ll make a ticket on GitHub.
This is in fact not a bug. Rather the Choleski decomposition is predicated on the matrix being symmetric.
I think there is in fact a check prior to the actual factorization that aims to verify the symmetry of the matrix.
This is not a bug, and has nothing to do with numerical stability in the formal sense. Just do cholesky(Hermitian(matrix)) on a matrix that is slightly asymmetric due to roundoff errors, as I explained in the issue you filed.
I think that defining a Cholesky decomposition for general matrices and hoping/checking for symmetry is the wrong solution. in the long run we should consider having methods only for Symmetric and Hermitian, and other types which enforce symmetry.
LAPACK already does this in a way, since the UPLO argument (eg in xPOTRF) picks the half of the matrix that is used for calculations, so the input is symmetric by construction. With Julia having such a rich type system, we can automate much of this.
I have no strong opinion either way. But still I wonder which is the clearest error message:
1- currently:
julia> cholesky(rot2 * [25.0 0.0; 0.0 4.0] * rot2')
ERROR: PosDefException: matrix is not Hermitian; Cholesky factorization failed.
Stacktrace:
[...]
2- if cholesky was defined only for Hermitian matrices (simulated output)
julia> cholesky(rot2 * [25.0 0.0; 0.0 4.0] * rot2')
ERROR: MethodError: no method matching cholesky(::Array{Float64,2})
Closest candidates are:
cholesky(::Hermitian{T, Array{T}) where {T} at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/factorization.jl:11
cholesky(::Symmetric{T, Array{T}) where {T} at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/factorization.jl:11
Stacktrace:
[...]
I would think (1) better explains what is going on. And if users provide perfectly hermitian matrices, they don’t have to see it. OTOH (2) is maybe more difficult to interpret, but provides some additional info on how to potentially fix the issue. And all users have to do it.
The problem is that in practice, a matrix is “perfectly” symmetric only if it was just constructed like that, but then it is fairly pointless to ask the user to duplicate elements just so that they can be checked. This property is not guaranteed (and again, for any nontrivial calculation, unlikely to be ) preserved.
The requirement to “provide perfectly hermitian matrices” more or less implies that the user should be aware of IEEE 754, which is surprisingly often not the case.
On the other hand, I agree… The second error message seems to bubble up an implementation detail, rather than an algorithm requirement. However, I still have the feeling that the second is the way to go, since it is more Julian and simply utilises one of the core features of Julia, the multiple dispatch. What LAPACK does (as Tamas explained) is just a radical way, which will silently fail if you use it wrongly
Agreed. But I don’t think anybody is asking users to duplicate elements. It’s just that the current implementation allows users to duplicate elements. Asking users to wrap their non-symmetric matrices into Hermitian views does not deduplicate anything; it just “ignores” the fact that the matrix is not really Hermitian.
Yes. Even though Julia’s objective is not necessarily to educate the masses, I’m wondering whether interfaces such as this one could be thought in such a way that users learn about IEEE-754 pitfalls while making their code correct.
I guess some of my reluctance about this option is that I feel like being Hermitian is a property characterizing the contents of a matrix; not its type. A parallel I could make are real functions which expect expect their argument to be in a certain range, like sqrt, asin or acos. Let us say that someone writes
theta = acos(x^2 / sqrt(x^2 + y^2))
Let us assume that round-off errors happen during the computation, which put the argument of acos outside the [-1, 1] range(*). Should we ask acos to be only defined for a specific subtype of Float64. Or should we rather ask the user to decide whether this is an unacceptable accumulation of round-off errors in their code (in which case they have to fix it), or whether it is tolerable (in which case they would wrap the argument in a min(..., 1.0), consciously accepting the loss of information ?
I understand that there are significant shortcomings to the parallel I’m drawing here between matrices and real numbers, which is why I’m not strongly advocating for either way of dealing with the problem at hand. I’m just thinking that this might be food for thought… (I might be wrong)
(*) it so happens that it is never the case with a binary representation of FP numbers and rounding to the nearest, but let’s pretend otherwise for the sake of the example.
I think about this differently: I consider Hermitian an <: AbstractMatrix like any other, and the fact that it is implemented as a wrapper for another matrix (which the user can construct from one half) is an implementation detail. Ideally, this fact would be exposed to the user only during construction and conversions.
I am not sure I understand this. “Hermitian” is a mathematical property of a matrix, LinearAlgebra.Hermitian is an implementation of this.
LinearAlgebra.Hermitian is an implementation that guarantees a matrix to be Hermitian. Not all Hermitian matrices need to be built or obtained this way.
julia> A = rand(Complex{Float64}, 2, 2)
2×2 Array{Complex{Float64},2}:
0.495929+0.908977im 0.740941+0.0841962im
0.344677+0.536084im 0.31571+0.788671im
julia> B = (A + adjoint(A)) / 2
2×2 Array{Complex{Float64},2}:
0.495929+0.0im 0.542809-0.225944im
0.542809+0.225944im 0.31571+0.0im
julia> ishermitian(B)
true
In the example above, B is a regular Array{Complex{Float64}, 2}, which happens to contain values which make it Hermitian. That is what I meant when I said that I see “Hermitian” as a property of the contents of the matrix (as opposed to its type, which in this case characterizes how elements are stored and accessed).
To reuse my analogy with real numbers: we can add, multiply, substract Float64 numbers as much as we want, and always get results of typeFloat64. Sometimes, the value of the result is within the [-1, 1] interval, which makes it suitable to be passed as the argument to acos. And sometimes it is not, and acos errors out. Should we make acos accept only arguments of type Float64BetweenM1andP1 ?
struct Float64BetweenM1andP1 <: Real
x :: Float64
Float64BetweenM1andP1(x) = new(max(min(x, 1), -1))
end
Thanks for the clarification. Just to clarify, I was proposing that all operations that need a Hermitian matrix restrict the argument to types which signal this, and if the user want to use some other type, he should take care of the conversion manually (eg Hermitian(A, :U)).
I am proposing this because, as opposed to interval restrictions on Float64, a matrix being Hermitian is
more costly to check,
nearly guaranteed to fail in floating point even for operations which ostensibly preserve it (in theory).
Yes, this is where my analogy with FP numbers falls short.
I think I understand why you are proposing this. I’m just wondering whether this would be abusing the type system, or not. Again, I’m not sure which way I like the most; I’m just taking the stance opposite to yours for the sake of the (hopefully healthy) argument
I kind of like this idea. It has the major advantage that code that happened to work for some data won’t break on some other data where a numerical instability gave slightly inexact results since you must signal the intention that a matrix be hermitian explicitly. I also like the idea of having some kind of debug vs. release flags to turn on/off checking approximate hermitianness. Want to open an issue to discuss this as a potential LinearAlgebra 2.0 API change?