Hello, good morning.
Days ago I started looking the multivariatestats.jl to do some pca and lda analysis, and notice that lda plots data inverted in this example:
https://juliastats.org/MultivariateStats.jl/stable/lda/
iris = dataset("datasets", "iris")
X = Matrix(iris[1:2:end,1:4])'
X_labels = Vector(iris[1:2:end,5])
pca = fit(PCA, X; maxoutdim=2)
Ypca = predict(pca, X)
lda = fit(MulticlassLDA, X, X_labels; outdim=2)
Ylda = predict(lda, X)
p = plot(layout=(1,2), size=(800,300))
for s in ["setosa", "versicolor", "virginica"]
points = Ypca[:,X_labels.==s]
scatter!(p[1], points[1,:],points[2,:], label=s, legend=:bottomleft)
points = Ylda[:,X_labels.==s]
scatter!(p[2], points[1,:],points[2,:], label=s, legend=:bottomleft)
end
plot!(p[1], title="PCA")
plot!(p[2], title="LDA")
https://juliastats.org/MultivariateStats.jl/stable/lda/
From page example:
From my machine
Is this a bug, or different results are normal?
LDA is easy to implement, so here is LDA based on inverse regression.
iris = dataset("datasets", "iris")
# Note I don't transpose X
X = Matrix(iris[1:2:end,1:4])
X_labels = Vector(iris[1:2:end,5]);
using Statistics, LinearAlgebra, DataFrames
indmat(Iv) = (Iv.==permutedims(unique(Iv)))
function lda(X, Xlbls)
μX = mean(X, dims=1)
Xw = svd(X.-μX).U # whiten X
Xw ./= std(Xw, dims=1) # make unit variance
Ya = indmat(Xlbls)
Σh = cov(Xw, Ya)/cov(Ya)*cov(Ya,Xw) # Prediction covariance
V = eigvecs(Σh, sortby=x->-abs(x))
DC = Xw*V # Discriminant Components
df = DataFrame(DC, [:DC1, :DC2, :DC3, :DC4])
df.xlbl = Xlbls
return df
end
DC = lda(X, X_labels)
# Plot
So the discriminant vectors V
are the eigenvectors of a prediction covariance matrix Σh
.
I guess you know that eigenvectors are not unique and only defined up to a multiple
(so yes you can get inverted signs). Note my DCs are centered and have unit variance (which is my preference).
2 Likes
thats means results in multivariatestats.jl are correct, even if thei are flipped?
The scatterplot looks correct. But their non-centered DC2 would prompt me to look at the MultivariateStats.jl code (if I was using it).
1 Like
Thank you. I had opened an issue in github. Will wait people do their cents before I close it.