I have an eigenvalues-decomposition based PCA function in my lib, and I test it with:
X = [1 8; 4.5 5.5; 9.5 0.5]
out = pcatest(X;K=2)
@test isapprox(out.X,[-4.58465 6.63182;-0.308999 7.09961; 6.75092 6.70262],atol=0.00001)
Now, I haven’t touched anything in that area of the lib for years, but on some commits using the latest 1.X Julia version I have now the CI failing with:
** Testing pca()...
Test Failed at /home/runner/work/BetaML.jl/BetaML.jl/test/Utils_tests.jl:227
Expression: isapprox(out.X, [-4.58465 6.63182; -0.308999 7.09961; 6.75092 6.70262], atol = 1.0e-5)
Evaluated: isapprox([4.584647274835241 -6.631817953272375; 0.3089991577646427 -7.0996140402490004; -6.750915650181362 -6.702621717219043], [-4.58465 6.63182; -0.308999 7.09961; 6.75092 6.70262]; atol = 1.0e-5)
I don’t remember almost anything of PCA, but I feel there could be different conventions in computing the eigenvalues, so the opposite signs don’t really matter… but I feel weird that there isn’t one that is followed consistently but results are (apparently) random… Is there a way I can enforce one, so I can test the results?
Is something changed in Julia 1.7.3 (the version to which GitHub expand Julia v1) compared to 1.7.2 that can explain this?
EDIT: I confirm this started about ~20 days ago when GitHub CI started to expand v1 to 1.7.3…
I get the same output on both 1.7.2 and 1.7.3, this is how I ran it on both.
julia> using LinearAlgebra
julia> function pca(X;K=nothing,error=0.05)
# debug
#X = [1 10 100; 1.1 15 120; 0.95 23 90; 0.99 17 120; 1.05 8 90; 1.1 12 95]
#K = nothing
#error=0.05
(N,D) = size(X)
if !isnothing(K) && K > D
@error("The parameter K must be ≤ D")
end
Σ = (1/N) * X'*(I-(1/N)*ones(N)*ones(N)')*X
E = eigen(Σ) # eigenvalues are ordered from the smallest to the largest
totVar = sum(E.values)
explVarByDim = [sum(E.values[D-k+1:end])/totVar for k in 1:D]
propVarExplained = 0.0
if K == nothing
for k in 1:D
if explVarByDim[k] >= (1 - error)
propVarExplained = explVarByDim[k]
K = k
break
end
end
else
propVarExplained = explVarByDim[K]
end
P = E.vectors[:,end:-1:D-K+1] # bug corrected 2/9/2021
return (X=X*P,K=K,error=1-propVarExplained,P=P,explVarByDim=explVarByDim)
end
pca (generic function with 1 method)
julia> X = [1 8; 4.5 5.5; 9.5 0.5]
3×2 Matrix{Float64}:
1.0 8.0
4.5 5.5
9.5 0.5
julia> pca(X;K=2)
(X = [-4.584647274835242 6.6318179532723756; -0.30899915776464276 7.0996140402490004; 6.750915650181363 6.702621717219042], K = 2, error = 0.0, P = [0.7456907130975964 0.6662922484916046; -0.6662922484916048 0.7456907130975964], explVarByDim = [0.9980637093577439, 1.0])
1 Like
Yes, I have installed locally Julia v1.7.3 and can’t replicate it (both running the function in isolation or testing the whole BetaML package)… I don’t know why GitHub sometimes get that results, it’s really weird (and scary)…
This is probably just different versions of BLAS being used.
1 Like
Just simply looking at eigen
does not reveal any major work done on it between 1.7.2 and 1.7.3. Could be something under the hood in BLAS
Eigenvectors are indeterminate up to multiplication by an arbitrary nonzero scalar: If Av=\lambda v, then A (cv) = \lambda (cv). It probably best if you design your tests to factor out this indeterminacy. You could normalize the eigenvector to unit length and fix the sign by requiring the first nonzero component to be positive.
3 Likes