A question ahout PCA

After applying PCA algorithm, how to display its principal component?thanks!!

There have been quite a few discussions on this, use the search function on discourse to find e.g.

1 Like

This was already answered by @nilshg in the post mentioned above.


You need the projection matrix, which you can get with projection(M). You can read more about the properties in the docs.

using MultivariateStats
X = rand(4,5)
M = fit(PCA, X; maxoutdim=2)
proj = projection(M)[:, 1]

thanks,but it does not tell me which is the extracted principal component?it only several numbers.

Because I want to plot these principal components

@abcde, I understand you, and I got into this a while back. I was not completely satisfied and so I addressed R about Julia. Here is just a code snippet:

using RCall
...
R"library(psych)"
# Estimate the number of HK's using the R function scree:
HK_R = R"scree($PCA_Daten)"

...

PCA_Modell = R"principal($PCA_Daten, $PCA_HK)"
PCA_scores =  R"principal($PCA_Daten, $PCA_HK)$scores"
show(PCA_Modell)
PCA_scores_M = convert(Matrix, PCA_scores)
...

And you will get something like this:

RObject{VecSxp}
Principal Components Analysis
Call: principal(r = `#JL`$PCA_Daten, nfactors = 4)
Standardized loadings (pattern matrix) based upon correla-tion matrix
RC1 RC2 RC3 RC4 h2 u2 com
1 0.91 -0.13 0.07 0.02 0.85 0.15 1.1
2 -0.34 -0.02 -0.72 -0.13 0.64 0.36 1.5
3 0.75 0.04 0.44 0.23 0.81 0.19 1.9
4 0.35 0.46 -0.03 -0.25 0.39 0.61 2.5
5 0.13 0.00 -0.27 0.80 0.73 0.27 1.3
6 -0.14 0.88 0.03 0.07 0.80 0.20 1.1
7 -0.02 0.88 -0.11 0.09 0.79 0.21 1.1
8 0.78 0.10 -0.44 0.01 0.80 0.20 1.6
9 -0.73 0.01 -0.01 -0.25 0.60 0.40 1.2
10 0.15 0.06 0.27 0.76 0.68 0.32 1.4
11 -0.23 -0.11 0.78 -0.13 0.69 0.31 1.3
RC1 RC2 RC3 RC4
SS loadings 2.87 1.80 1.67 1.44
Proportion Var 0.26 0.16 0.15 0.13
Cumulative Var 0.26 0.42 0.58 0.71
Proportion Explained 0.37 0.23 0.21 0.19
Cumulative Proportion 0.37 0.60 0.81 1.00
Mean item complexity = 1.4
Test of the hypothesis that 4 components are sufficient.
The root mean square of the residuals (RMSR) is 0.09
with the empirical chi square 1473.01 with prob < 3.2e-303
Fit based upon off diagonal values = 0.89

But I have to admit that I have not tried out what Julia now delivers as an output. So this is my quick answer…! :wink:

Fair enough. Each column of proj gives the corresponding PCA axis. I assumed you only wanted the first axis, since you didn’t provide the data you wanted to transform.

To visualize your data in 2D with PCA, you can transform your data X.

using PyCall, MultivariateStats, Plots

data = pyimport("sklearn.datasets")

# Generate labeled data with 3 classes
X, y = data.make_blobs(n_samples=1000, n_features = 10, centers = 3, cluster_std = 3, random_state=0)

# Columns correspond to observations
X = Array(X')

# Fit PCA model in 2d, for visualization
M = fit(PCA, X; maxoutdim=2)
             
# Transform data
x = transform(M, X)

# Plot dims 1 and 2 w/o PCA
p1 = scatter(X[1,:], X[2,:], c=y, xlabel="Dim. 1", ylabel="Dim. 2", title="Original")

# Plot dims 1 and 2 with PCA
p2 = scatter(x[1,:], x[2,:], xlabel="PC1", ylabel="PC2", c=y, title="PCA")

# Compare plots
plot(p1, p2, layout=2, legend=false)

pca

In BetaML you have what I believe is one of the simplest approach (disclaimer: I am the author):

“”’

pca(X;K,error)

Perform Principal Component Analysis returning the matrix reprojected among the dimensions of maximum variance.

Parameters:

  • X : The (N,D) data to reproject

  • K : The number of dimensions to maintain (with K<=D) [def: nothing]

  • error: The maximum approximation error that we are willing to tollerate [def: 0.05]

Return:

  • A named tuple with:

  • X: The reprojected (NxK) matrix

  • K: The dimensions retieved

  • error: The actual proportion of variance not explained in the reprojected dimensions

  • P: The (D,K) matrix of the eigenvectors associated to the K-largest eigenvalues used to reproject the data matrix

Note that if K is indicated, the parameter error has no effect.

“”"

1 Like

If the way the package works is bothering you, it’s only a few lines of code to get them yourself using LinearAlgebra. For example, in most cases I’m familiar with, users want the components ordered from largest eignvalue to smallest. Given data matrix in x with rows as observations and columns as variables/dimensions:

    x = x .- mean(x, dims=1)     # Centre the data
    xcov = Symmetric(cov(x))    #Compute the covariance matrix
    ef = eigen(xcov)    #Get eigenvalues and eigenvectors
    eigvalvec = reverse(ef.values, dims=1)    #Orient eigenvalues largest to smallest
    eigvecmat = reverse(ef.vectors, dims=2)    #Orient eigenvectors the same
    eigvalcs = cumsum(eigvalvec)    #Cumulative sum of eigenvalues
    vratio = eigvalcs ./ eigvalcs[end]    #Ratios of variance explained
    pc = x * eigvecmat     #Principal components ordered largest eigenvalue to smallest

You didn’t actually ask for the variance ratios but again they are trivial to compute and often useful so I thought I’d include them.

Cheers,

Colin

1 Like