Question regarding numerical stability among Julia versions

Dear Julians.

I was testing the following code (M,C and K are sparse, symmetric and positive definite matrices)

Julia> M,C,K=THS();

julia> cholM=cholesky(M);

julia> Cb = cholM\C;

julia> Kb = cholM\K;

julia> F = 0.5*(Array(Cb) + sqrt(Array(Cb^2 - 4*Kb)));

julia> norm(F^2 - F*Cb + Kb)

Using Julia-dev Version 1.12.0-DEV.317 Commit 0e28cf6abf (2024-04-08 10:46 UTC)
or Julia Version 1.11.0-alpha2 Commit 9dfd28ab751 (2024-03-18 20:35 UTC)
gives a reasonable value

1.0530104604285245e-5

but using julia 1.9 (installed using juliaup in a Linux (Fedora 39) machine gives

7.606244293550662e52

a very weird result.

More detailed data about the versions

julia> versioninfo()
Julia Version 1.12.0-DEV.317
Commit 0e28cf6abf (2024-04-08 10:46 UTC)
Platform Info:
OS: Linux (x86_64-redhat-linux)
CPU: 12 × 12th Gen Intel(R) Core™ i5-1235U
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
JULIA_EDITOR = code

and

julia> versioninfo()
Julia Version 1.9.4
Commit 8e5136fa297 (2023-11-14 08:46 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 12 × 12th Gen Intel(R) Core™ i5-1235U
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, alderlake)
Threads: 1 on 12 virtual cores
Environment:
JULIA_EDITOR = code

It does not happen for small, dense, matrices. So I guess is something related to numerical stability of computing the sqrt.

Matrices are 2200 x 2200

K

K.txt · GitHub

C = 1E-6*K

M

Matrix M · GitHub

Does anyone can give me some light regarding what may be the cause for this difference?

Thank you!

1 Like

Can you identify which operation causes the difference? It may be an improvement (bug fix, or algorithmic) on some operation, or just a case of a very unstable calculation depending on numerical inaccuracies.

What happens in 1.10?

1 Like

@sethaxen revised the matrix sqrt code in that time frame; perhaps he eliminated a bug in the process?
https://github.com/JuliaLang/julia/pull/39973

1 Like

@lmiq and @Ralph_Smith Thank you for your advice. I will take a look at 1.10 and the result of sqrt (as well as the conditioning of both matrices).

It seems to be an overflow problem during the sqrt(A)

In this example, A is non symmetric with norm=1.32E9, smaller eigenvalue is
-4.8e7 and the largest is -4.06 (large condition number). The resulting matrix of sqrt(A) has norm 8.58E39 and does not satisfy (sqrt(A))^2 - A = 0

Actually, the norm of the difference is

julia> norm( (sqrt(A)^2 - A)
2.853968508045082e63

Indeed, if I scale A by its norm A2 = A/norm(A) it works such that

julia> norm( (sqrt(A2)^2 - A2)
3.185526215409009e-14

With this example, I observe the same behavior in Julia 1.9, 1.10, 1.11 and dev.

1 Like

Using eigen decomposition works

julia> D,U = eigen(A);

julia> R=sqrt.(Complex.(D));

julia> norm(U*diagm(R)*diagm(R)*inv(U)-A)
9.41423450406504e-5

Thanks, but why the overflow in the original post materialize only on certain Julia versions?

1 Like

No, that wouldn’t be it. That PR was present already in v1.7.0.

1 Like

@CodeLenz you could try git bisecting a local build of Julia to identify the commit at which the instability goes away.

1 Like

OK guys. I noticed that I was loading MKL.jl in my startup.jl. I also simplified the problem avoiding the computation of Cb. Thus, matrix A now is just -4*Kb.

All the eigenvalues of A are negative and inv(A)*A \approx I, as expected. A is non symmetric with dimension 2200 x 2200.

I am getting very weird results for this “simple” problem. After inspecting the problem, it seems that the culprit is the Schur decomposition used in sqrt AFTER 1.10.2.

There is also a problem with MKL.jl

Test script:

cholM = cholesky(M);

Kb = Array(cholM\K)

A = - 4*Kb

julia_root = sqrt(A)

diff = norm(julia_root*julia_root .- A)

if diff>1
println("failure ",diff)
else
println("Success ",diff)
end

Results

  1. Julia 1.0.4 → Success (norm is 1E-5)
    Julia 1.0.4 + MKL → Failure (norm is 3.7E64)

  2. Julia 1.10.2 → Success (norm is 4E-5)
    Julia 1.10.2 + MKL → Failure (norm is 3.7E64)

  3. Julia 1.11.0-beta1 → Failure (norm is 1.4E61)
    Julia 1.11.0-beta1 + MKL → Failure (norm is 2.8E63)

  4. Julia DEV.322 → Failure (norm is 1E61)
    Julia DEV.322 + MKL → Failure (norm is 2.8E63)

Thus, there is some problem with MKL.jl (at least, regarding sqrt of a negative definite, non symmetric matrix like A)

Now, testing with eigen decomposition of A (it should work!)

Eigen decomposition

D,U = eigen(A)

R=sqrt.(Complex.(D))

diff_eigen = norm(U*diagm(R)*diagm(R)*inv(U) .- A)

if diff_eigen>1
println("failure ",diff_eigen)
else
println("Success ",diff_eigen)
end

Results

  1. Julia 1.0.4 → Success (norm is 7.7E-5)
    Julia 1.0.4 + MKL → Success (norm is 0.0002858379704314278)

  2. Julia 1.10.2 → Success (norm is 0.0001034042666795483)
    Julia 1.10.2 + MKL → Success (norm is 0.0002858379704314278)

  3. Julia 1.11.0-beta1 → Success (norm is 9.41423450406504e-5)
    Julia 1.11.0-beta1 + MKL → Success (norm is 0.0006612749077237161)

  4. Julia DEV.322 → Success (norm is 9.41423450406504e-5)
    Julia DEV.322 + MKL → Success (norm is 0.0006612749077237161)

I understand that sqrt(A) is using Schur decomposition. Lets give a try

Schur decomposition

S = Schur{Complex}(schur(A))

R = sqrt(Complex.(S.Schur))

diff_schur = norm( (S.Z)(RR)*adjoint(S.Z) .- A)

if diff_schur>1
println("failure ",diff_schur)
else
println("Success ",diff_schur)
end

  1. Julia 1.0.4 → Sucess
    Julia 1.0.4 + MKL → Failure

  2. Julia 1.10.2 → Success
    Julia 1.10.2 + MKL → Failure

  3. Julia 1.11.0-beta1 → Failure
    Julia 1.11.0-beta1 + MKL → Failure

  4. Julia DEV.322 → Failure
    Julia DEV.322 + MKL → Failure

EDIT

Additional test

S = Schur{Complex}(schur(A))

R = sqrt(Complex.(S.Schur))

A2 = (S.Z)*(S.Schur)*adjoint(S.Z)

@show norm(A2.-A)

and all versions, with and without MKL are giving a correct matrix A2. Thus, the problem is not in the Schur decomposition. I also tested sqrt(Complex.(A)) with the same results as before.

Thus, the problem should be in sqrt(S.Schur) an upper triangular complex matrix.

2 Likes

Dear @sethaxen

I would have to study this procedure, since I never did something similar. I am trying to understand the problem to pinpoit the culprit.

Julia version for the algorithm in
https://mathsfromnothing.au/matrix-square-root/?i=1
works in every version I tested, with and without MKL


function Sqrt(A)
  
	# Dimension
	n = size(A,1) 

	# Schur decomposition of A
	S,U = Schur{Complex}(schur(A))

    # Diagonal of S
	D = diag(S)

	# sqrt of main diagonal of S
	X = diagm(sqrt.(D))

    if !isdiag(S)
  		for j=2:n
    		for i=j-1:-1:1
				k = i+1:j-1
				somat = transpose(X[i,k])*X[k,j]
    		  	X[i,j] = (S[i,j]-somat)/(X[i,i]+X[j,j])
   			end
  		end
	end

    sqrtA = U*X*adjoint(U)

end
1 Like

Do your matrices have any repeated eigenvalues? The matrix sqrt code uses a blocked scheme for large nonsymmetric matrices which invokes a Sylvester solver that is unreliable if multiple eigenvalues show up in the same block. For some reason the Julia code ignores the error return from LAPACK in such cases.

The simpler approaches, like the one you just listed, do not suffer from this problem.

I say “unreliable” rather than “broken” because whether the solver actually fails depends on how close the eigenvalues appear to be and some floating point details, so that could explain why a given matrix behaves differently with different libraries.

1 Like

@Ralph_Smith Indeed, there are many repeated eigenvalues. Thank your for your feedback.

Nonetheless, sqrt(schur(A).Schur) is the main culprit here and this matrix is UpperTriangular. I was thinking that Julia dispatches to a specific solver for triu matrices, using Lapack. Other interesting fact is the MKL error in all versions I tested.

Thank you.

@sethaxen

Something related to sqrt(A) after (not including) 1.10.2. As @Ralph_Smith said, Julia is ignoring the return error flag from LAPACK in this case (A has repeated eigenvalues).

I have opened a ticket with this issue

I made a very small and simple repository to this function (with some checks and obvious optimizations). Just in case someone is interested in this solution.

Update:

A[abs.(A).<sqrt(eps(1.0))].=zero(eltype(A))

fixes the problem. The culprit seems to be schur(A), computed inside sqrt(A), since julia’s sqrt and the Sqrt(A) subroutine also fails when there are terms like 1E-52 in A.