LDA example in multivariatestats.jl flipped x and y values

Hello, good morning.

Days ago I started looking the multivariatestats.jl to do some pca and lda analysis, and notice that lda plots data inverted in this example:

https://juliastats.org/MultivariateStats.jl/stable/lda/

	iris = dataset("datasets", "iris")

	X = Matrix(iris[1:2:end,1:4])'
	X_labels = Vector(iris[1:2:end,5])
	
	pca = fit(PCA, X; maxoutdim=2)
	Ypca = predict(pca, X)

	lda = fit(MulticlassLDA, X, X_labels; outdim=2)
	Ylda = predict(lda, X)
	
	p = plot(layout=(1,2), size=(800,300))
	
	for s in ["setosa", "versicolor", "virginica"]
	
	    points = Ypca[:,X_labels.==s]
	    scatter!(p[1], points[1,:],points[2,:], label=s, legend=:bottomleft)
	    points = Ylda[:,X_labels.==s]
	    scatter!(p[2], points[1,:],points[2,:], label=s, legend=:bottomleft)
	
	end
	plot!(p[1], title="PCA")
	plot!(p[2], title="LDA")

https://juliastats.org/MultivariateStats.jl/stable/lda/

From page example:

From my machine

Is this a bug, or different results are normal?

LDA is easy to implement, so here is LDA based on inverse regression.

iris = dataset("datasets", "iris")

# Note I don't transpose X
X = Matrix(iris[1:2:end,1:4])
X_labels = Vector(iris[1:2:end,5]);

using Statistics, LinearAlgebra, DataFrames

indmat(Iv) = (Iv.==permutedims(unique(Iv)))

function lda(X, Xlbls)
    μX = mean(X, dims=1)
    Xw = svd(X.-μX).U # whiten X
    Xw ./= std(Xw, dims=1) # make unit variance
    Ya = indmat(Xlbls)
    Σh = cov(Xw, Ya)/cov(Ya)*cov(Ya,Xw) # Prediction covariance
    V = eigvecs(Σh, sortby=x->-abs(x))
    DC = Xw*V     # Discriminant Components
    df = DataFrame(DC, [:DC1, :DC2, :DC3, :DC4])
    df.xlbl = Xlbls  
    return df
end

DC = lda(X, X_labels)

# Plot

LDA
So the discriminant vectors V are the eigenvectors of a prediction covariance matrix Σh.
I guess you know that eigenvectors are not unique and only defined up to a multiple
(so yes you can get inverted signs). Note my DCs are centered and have unit variance (which is my preference).

2 Likes

thats means results in multivariatestats.jl are correct, even if thei are flipped?

The scatterplot looks correct. But their non-centered DC2 would prompt me to look at the MultivariateStats.jl code (if I was using it).

1 Like

Thank you. I had opened an issue in github. Will wait people do their cents before I close it.