SVD gives different results from Python with MWE

kadir-gunel · March 7, 2021, 1:17pm

Hello,

I have a python written code, and it uses svd from numpy. And I am trying to port the same code in Julia.
The problem I am facing is that after some point, even though I read the same files for both programming languages, svd method in two languages decomposes the same matrix differently. Hence, solution changes and I cannot replicate the experiment.

I have multiple svd usages inside the code, the first two gives nearly the exactly decomposition. (Everything is in Float32) But after the third svd function call my results and the original(python) code starts to differ. The only difference from the first 2 svd functions is that the third one removes the full_matrices=False flag in Python code.

 u, s, vt = np.linalg.svd(xsim[:idx], full_matrices=False) # get the same result
 u, s, vt = np.linalg.svd(zsim[:idx], full_matrices=False) # get the same result
 u, s, vt = np.linalg.svd(z[trg_indices].T.dot(x[src_indices])) # third call to svd; where things get messy.

On the other hand, in Julia, even thoug I changed the algorithm to QRIteration , and full to true. I am not getting the same results.

I did some research and both numpy and Julia uses LAPACK package underneath. Then what should be the correct way to get the same results ?

This is what I get when I write print(numpy.show_config())

blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]

For julia could someone please help me if all set is the same as Python ?

B.R.

carstenbauer · March 7, 2021, 1:25pm

It would be much easier to help you if you could create a MWE that is give us the specific matrices xsim[:idx], zsim[:idx], and z[trg_indices].T.dot(x[src_indices]) such that we can reproduce the behaviour.

On a general note: Have you taken account for the fact that Julia uses column-major ordering and Python row-major ordering? That is matrices in Julia in Python might differ by a transpose operation.

kadir-gunel · March 7, 2021, 1:27pm

Yeah ,you are right. But those matrices are huge files nearly (one is 1.1GB and the other is 512MB). If you want I can share you the links.

carstenbauer · March 7, 2021, 1:29pm

Ideally, you should try to create a minimal example. For example, If you only consider the submatrix, say M[1:100,1:100], do Numpy and Julia still disagree?

Having said that, sharing a link to a large file is still better than sharing nothing

kadir-gunel · March 7, 2021, 1:33pm

Actually I am reading submatrices of these huge matrices. The first 4000 words.

carstenbauer · March 7, 2021, 1:36pm

Instead of posting a link to a “raw” file that then needs to get loaded and processed by a function you could make it easier for the other side and dump the final matrix, i.e. xsim[:idx] to a reasonable file format that can be read directly. For example, using NPZ.jl which is compatible with numpy.save and numpy.load.

This way you can simply load the file in Julia and Python (i.e. numpy) and assure that you actually operate on precisely the same data.

(You want to abstract away your context as much as possible when creating a MWE.)

kadir-gunel · March 7, 2021, 1:38pm

Thank didn’t know that. I will upload them.

stevengj · March 7, 2021, 1:39pm

Differ in what way? Realize that you will often get slightly different results due to roundoff errors, and there is also some arbitrariness in the choice of phase of the singular vectors (or their sign, for real matrices) that can change between versions of LAPACK/BLAS. If you are using the full (not thin) SVD, then there can also be arbitrary rotations in the nullspace portions of the basis (as well as rotations in vectors for pairs of equal singular values).

kadir-gunel · March 7, 2021, 2:06pm

Could you please try svd with these both source and target embeddings ?

For the MWE:


using LinearAlgebra
using NPZ # for loading uploaded matrices

X = npzread("./path_to_file/src_emb.npy")
Z = npzread("./path_to_file/trg_emb.npy")

F = svd(Z' * X, full=true)
W = F.V * F.U'

W should show :

300×300 Array{Float32,2}:
 -0.136035    -0.0777119    0.0617536   0.0322536    -0.0803463   -0.0503565  …   0.095194    -0.0131295   -0.0396312    -0.00628574   0.0511491
 -0.00174201   0.0281835    0.0454273   0.0358372     0.0529457   -0.0639435     -0.0942243    0.0325997    0.000161858  -0.0514902   -0.0498046
  0.0676922    0.237949     0.0913208  -0.0106957    -0.00361676  -0.0334244     -0.0415728    0.0830503    0.0839673    -0.0716634   -0.0183218
  0.180583    -0.0958469   -0.0631438  -0.000189014   0.0190076   -0.0152651      0.0361847   -0.00151322   0.0589926     0.0360943   -0.120823
  0.0309126    0.0636542    0.0391062  -0.0415557    -0.107748     0.0503444     -0.00643703  -0.0312398    0.0658869     0.00057633  -0.0033062
  ⋮                                                                ⋮          ⋱   ⋮                                                   
 -0.00427003  -0.0422672    0.0433433   0.0792277    -0.0729017   -0.0360674      0.0137698    0.00893876   0.0135633    -0.00463451  -0.0784995
  0.0467229    0.0159696   -0.0542639  -0.0673708     0.0118304    0.0391193      0.121941    -0.041253    -0.0230913    -0.0481352   -0.0489984
  0.0685667    0.00899705  -0.0338196  -0.102029     -0.072511     0.074575       0.0204513    0.0675888    0.0863168    -0.00807602   0.00272199
  0.0906389    0.08509     -0.0440936   0.0993734     0.0830268    0.0200159      0.0228065    0.067321     0.0181914    -0.0103052   -0.0205173

Whereas python:


import numpy as np 

X = np.load('./path_to_/src_emb.npy')
Z = np.load('./path_to_/trg_emb.npy')

u,s,vt = np.linalg.svd(Z.T.dot(X))

w = vt.T.dot(u.T)

w should show

array([[ 0.02077192, -0.09222455,  0.16809827, ..., -0.07213789,
         0.01867277, -0.03105616],
       [-0.0070491 , -0.04300215,  0.02156948, ...,  0.08834429,
        -0.01833235, -0.01745731],
       [ 0.08617274,  0.15471898,  0.01507143, ...,  0.01034247,
        -0.11459561, -0.01204915],
       ...,
       [ 0.00199408,  0.01696967, -0.04052072, ...,  0.00589352,
         0.01136548, -0.05689107],
       [ 0.01693112, -0.05401053, -0.03806448, ...,  0.01929145,
        -0.03649945,  0.03389705],
       [ 0.12757824,  0.05976016, -0.03483151, ...,  0.04901506,
        -0.09714089, -0.04803994]], dtype=float32)

Yes, I understand and expect that. But after decomposing the mutliplication of both matrices, all matrices U, V start similar but they are quite dissimilar at later column space. I am not claiming anything and do not know much about how things work under the hood. I am just wondering how to make both programs give the same (similar) results.

carstenbauer · March 7, 2021, 3:13pm

Well, the results are certainly similar:

using NPZ, PyCall, Statistics, LinearAlgebra
np = pyimport("numpy");

adiff(A,B) = maximum(abs.(abs.(A) .- abs.(B)))

mat_src = npzread("src_emb.npy"); # Float32 data
U, S, Vt = np.linalg.svd(mat_src, full_matrices=false)
F = svd(mat_src);
display((adiff(F.U, U), adiff(F.S, S), adiff(F.Vt, Vt)))
display((abs.(F.U) ≈ abs.(U), F.S ≈ S, abs.(F.Vt) ≈ abs.(Vt)))

mat_trg = npzread("trg_emb.npy"); # Float32 data
U, S, Vt = np.linalg.svd(mat_trg, full_matrices=false)
F = svd(mat_trg);
display((adiff(F.U, U), adiff(F.S, S), adiff(F.Vt, Vt)))
display((abs.(F.U) ≈ abs.(U), F.S ≈ S, abs.(F.Vt) ≈ abs.(Vt)))

I get:

(0.0002946444f0, 9.1552734f-5, 0.0007101372f0)
(false, true, false)

(4.1430816f-5, 0.00021362305f0, 0.00010115653f0)
(true, true, true)

kadir-gunel · March 7, 2021, 3:40pm

Thank you for your response.

In your sample, you decomposed individual matrices. And I assume you applied svd on individual matrices in order to show that the difference is so small that it does not matter if there is a multiplication operation (or another op.) between the matrices.
And as you are trying to explain the differences between decomposed values are really small which are negligible.

But when I do for the multiplication of two matrices the result is :

julia> display((adiff(F.U, U), adiff(F.S, S), adiff(F.Vt, Vt)))
(0.47538608f0, 0.0078125f0, 0.6747172f0)

julia> display((abs.(F.U) ≈ abs.(U), F.S ≈ S, abs.(F.Vt) ≈ abs.(Vt)))
(false, true, false)

For the singular values (S and F.S), the difference again is very small. But cannot say the same for the orthogonal matrices U, V .

Could you please clarify what should I do ?
Thank you in advance

stevengj · March 7, 2021, 4:51pm

I think the basic problem here is that this W is an ill-conditioned function of A, so it will be extremely sensitive to small differences in roundoff errors (in any language).

It correspond to taking A^T = V \Sigma U^T and then forming a new matrix W = A^T U \Sigma^{-1} U^T = VU^T. This is a problem because whenever A has small singular values the matrix U \Sigma^{-1} U^T that you are (implicitly) multiplying by is ill-conditioned. (In your example, the 300\times 300 matrix A has numerical rank \approx 133 — its near-zero singular values are basically numerical noise that you are amplifying to unity).

A symptom of this is that, even in Python, you get a completely different W matrix if you do the computation in double vs. single precision:

import numpy as np 

X = np.load('src_emb.npy')
Z = np.load('trg_emb.npy')

u,s,vt = np.linalg.svd(Z.T.dot(X))
w = vt.T.dot(u.T)

u64,s64,vt64 = np.linalg.svd(np.float64(Z).T.dot(np.float64(X)))
w64 = vt64.T.dot(u64.T)

np.linalg.norm(w - w64, 2)

gives 1.99999... (i.e. roughly 2, which is what you would expect for the operator norm of the difference between two random unitary matrices).

You may need to re-think the computation you are performing if it depends on this matrix. (In some cases the final result may be okay if the algorithm is sufficiently clever/careful, even if intermediate matrices like W vary wildly.)

kadir-gunel · March 7, 2021, 5:47pm

You may need to re-think the computation you are performing if it depends on this matrix. (In some cases the final result may be okay if the algorithm is sufficiently clever/careful, even if intermediate matrices like W vary wildly.)

svd occurs inside in a loop and since W is ill-conditioned (didn’t know that) 2 programs start to differ immediately. The one in python optimizes the result much better where as the Juila code stucks and finishes early.

Thank you very much for your time and clean explanation.

Cheers from Istanbul

stevengj · March 7, 2021, 9:11pm

PS. You can also pass alg=LinearAlgebra.QRIteration() to Julia’s svd function to use a slower (but slightly more accurate) algorithm, in case that helps. I would still be worried about relying on W=VU^* if you want a robust algorithm, however.

kadir-gunel · March 8, 2021, 6:00am

I have tried QRIteration() but nothing changed.

Thank you

mkitti · March 9, 2021, 2:32pm

Having exhausted other options, you can try to reverse engineer Numpy to see if there are preprocessing the matrix somehow:

github.com

numpy/numpy/blob/v1.20.0/numpy/linalg/linalg.py#L1482-L1674


      
          @array_function_dispatch(_svd_dispatcher)
          def svd(a, full_matrices=True, compute_uv=True, hermitian=False):
              """
              Singular Value Decomposition.
          
          
    When `a` is a 2D array, it is factorized as ``u @ np.diag(s) @ vh
              = (u * s) @ vh``, where `u` and `vh` are 2D unitary arrays and `s` is a 1D
              array of `a`'s singular values. When `a` is higher-dimensional, SVD is
              applied in stacked mode as explained below.
          
          
    Parameters
              ----------
              a : (..., M, N) array_like
                  A real or complex array with ``a.ndim >= 2``.
              full_matrices : bool, optional
                  If True (default), `u` and `vh` have the shapes ``(..., M, M)`` and
                  ``(..., N, N)``, respectively.  Otherwise, the shapes are
                  ``(..., M, K)`` and ``(..., K, N)``, respectively, where
                  ``K = min(M, N)``.
              compute_uv : bool, optional

This file has been truncated. show original

It looks like they use a subset of LAPACK processed through f2c in the lapack_lite package.

Topic		Replies	Views
SVD 2x slower than in Matlab and how to get best performance on Windows10 Performance	20	3472	October 9, 2024
Julia SVD results differing from MATLAB Numerics linearalgebra	5	1053	February 2, 2021
svdvals is alarmingly slow Performance	31	2956	March 15, 2018
SVD: Better default to gesvd! instead of gesdd!? Internals & Design linearalgebra	21	5067	February 14, 2019
SVD on a matrix that does not fit into memory Numerics memory , svd	20	2243	January 16, 2023

SVD gives different results from Python with MWE

Related topics