LDA example in multivariatestats.jl flipped x and y values

Hello, good morning.

Days ago I started looking the multivariatestats.jl to do some pca and lda analysis, and notice that lda plots data inverted in this example:


	iris = dataset("datasets", "iris")

	X = Matrix(iris[1:2:end,1:4])'
	X_labels = Vector(iris[1:2:end,5])
	pca = fit(PCA, X; maxoutdim=2)
	Ypca = predict(pca, X)

	lda = fit(MulticlassLDA, X, X_labels; outdim=2)
	Ylda = predict(lda, X)
	p = plot(layout=(1,2), size=(800,300))
	for s in ["setosa", "versicolor", "virginica"]
	    points = Ypca[:,X_labels.==s]
	    scatter!(p[1], points[1,:],points[2,:], label=s, legend=:bottomleft)
	    points = Ylda[:,X_labels.==s]
	    scatter!(p[2], points[1,:],points[2,:], label=s, legend=:bottomleft)
	plot!(p[1], title="PCA")
	plot!(p[2], title="LDA")


From page example:

From my machine

Is this a bug, or different results are normal?

LDA is easy to implement, so here is LDA based on inverse regression.

iris = dataset("datasets", "iris")

# Note I don't transpose X
X = Matrix(iris[1:2:end,1:4])
X_labels = Vector(iris[1:2:end,5]);

using Statistics, LinearAlgebra, DataFrames

indmat(Iv) = (Iv.==permutedims(unique(Iv)))

function lda(X, Xlbls)
    μX = mean(X, dims=1)
    Xw = svd(X.-μX).U # whiten X
    Xw ./= std(Xw, dims=1) # make unit variance
    Ya = indmat(Xlbls)
    Σh = cov(Xw, Ya)/cov(Ya)*cov(Ya,Xw) # Prediction covariance
    V = eigvecs(Σh, sortby=x->-abs(x))
    DC = Xw*V     # Discriminant Components
    df = DataFrame(DC, [:DC1, :DC2, :DC3, :DC4])
    df.xlbl = Xlbls  
    return df

DC = lda(X, X_labels)

# Plot

So the discriminant vectors V are the eigenvectors of a prediction covariance matrix Σh.
I guess you know that eigenvectors are not unique and only defined up to a multiple
(so yes you can get inverted signs). Note my DCs are centered and have unit variance (which is my preference).


thats means results in multivariatestats.jl are correct, even if thei are flipped?

The scatterplot looks correct. But their non-centered DC2 would prompt me to look at the MultivariateStats.jl code (if I was using it).

1 Like

Thank you. I had opened an issue in github. Will wait people do their cents before I close it.