I think the basic problem here is that this W
is an ill-conditioned function of A
, so it will be extremely sensitive to small differences in roundoff errors (in any language).
It correspond to taking A^T = V \Sigma U^T and then forming a new matrix W = A^T U \Sigma^{-1} U^T = VU^T. This is a problem because whenever A has small singular values the matrix U \Sigma^{-1} U^T that you are (implicitly) multiplying by is ill-conditioned. (In your example, the 300\times 300 matrix A has numerical rank \approx 133 — its near-zero singular values are basically numerical noise that you are amplifying to unity).
A symptom of this is that, even in Python, you get a completely different W matrix if you do the computation in double vs. single precision:
import numpy as np
X = np.load('src_emb.npy')
Z = np.load('trg_emb.npy')
u,s,vt = np.linalg.svd(Z.T.dot(X))
w = vt.T.dot(u.T)
u64,s64,vt64 = np.linalg.svd(np.float64(Z).T.dot(np.float64(X)))
w64 = vt64.T.dot(u64.T)
np.linalg.norm(w - w64, 2)
gives 1.99999...
(i.e. roughly 2, which is what you would expect for the operator norm of the difference between two random unitary matrices).
You may need to re-think the computation you are performing if it depends on this matrix. (In some cases the final result may be okay if the algorithm is sufficiently clever/careful, even if intermediate matrices like W vary wildly.)