Hello, good morning.
Days ago I started looking the multivariatestats.jl to do some pca and lda analysis, and notice that lda plots data inverted in this example:
https://juliastats.org/MultivariateStats.jl/stable/lda/
iris = dataset("datasets", "iris")
X = Matrix(iris[1:2:end,1:4])'
X_labels = Vector(iris[1:2:end,5])
pca = fit(PCA, X; maxoutdim=2)
Ypca = predict(pca, X)
lda = fit(MulticlassLDA, X, X_labels; outdim=2)
Ylda = predict(lda, X)
p = plot(layout=(1,2), size=(800,300))
for s in ["setosa", "versicolor", "virginica"]
points = Ypca[:,X_labels.==s]
scatter!(p[1], points[1,:],points[2,:], label=s, legend=:bottomleft)
points = Ylda[:,X_labels.==s]
scatter!(p[2], points[1,:],points[2,:], label=s, legend=:bottomleft)
end
plot!(p[1], title="PCA")
plot!(p[2], title="LDA")
https://juliastats.org/MultivariateStats.jl/stable/lda/
From page example:
From my machine
Is this a bug, or different results are normal?
LDA is easy to implement, so here is LDA based on inverse regression.
iris = dataset("datasets", "iris")
# Note I don't transpose X
X = Matrix(iris[1:2:end,1:4])
X_labels = Vector(iris[1:2:end,5]);
using Statistics, LinearAlgebra, DataFrames
indmat(Iv) = (Iv.==permutedims(unique(Iv)))
function lda(X, Xlbls)
μX = mean(X, dims=1)
Xw = svd(X.-μX).U # whiten X
Xw ./= std(Xw, dims=1) # make unit variance
Ya = indmat(Xlbls)
Σh = cov(Xw, Ya)/cov(Ya)*cov(Ya,Xw) # Prediction covariance
V = eigvecs(Σh, sortby=x->-abs(x))
DC = Xw*V # Discriminant Components
df = DataFrame(DC, [:DC1, :DC2, :DC3, :DC4])
df.xlbl = Xlbls
return df
end
DC = lda(X, X_labels)
# Plot

So the discriminant vectors V are the eigenvectors of a prediction covariance matrix Σh.
I guess you know that eigenvectors are not unique and only defined up to a multiple
(so yes you can get inverted signs). Note my DCs are centered and have unit variance (which is my preference).
thats means results in multivariatestats.jl are correct, even if thei are flipped?
The scatterplot looks correct. But their non-centered DC2 would prompt me to look at the MultivariateStats.jl code (if I was using it).
Thank you. I had opened an issue in github. Will wait people do their cents before I close it.