A question ahout PCA

abcde · June 26, 2020, 1:21pm

After applying PCA algorithm, how to display its principal component?thanks!!

nilshg · June 26, 2020, 1:45pm

There have been quite a few discussions on this, use the search function on discourse to find e.g.

eliassno · June 26, 2020, 1:48pm

This was already answered by @nilshg in the post mentioned above.

You need the projection matrix, which you can get with projection(M). You can read more about the properties in the docs.

using MultivariateStats
X = rand(4,5)
M = fit(PCA, X; maxoutdim=2)
proj = projection(M)[:, 1]

abcde · June 26, 2020, 2:15pm

thanks，but it does not tell me which is the extracted principal component?it only several numbers.

abcde · June 26, 2020, 2:17pm

Because I want to plot these principal components

Gunter_Faes · June 26, 2020, 3:50pm

@abcde, I understand you, and I got into this a while back. I was not completely satisfied and so I addressed R about Julia. Here is just a code snippet:

using RCall
...
R"library(psych)"
# Estimate the number of HK's using the R function scree:
HK_R = R"scree($PCA_Daten)"

...

PCA_Modell = R"principal($PCA_Daten, $PCA_HK)"
PCA_scores =  R"principal($PCA_Daten, $PCA_HK)$scores"
show(PCA_Modell)
PCA_scores_M = convert(Matrix, PCA_scores)
...

And you will get something like this:

RObject{VecSxp}
Principal Components Analysis
Call: principal(r = `#JL`$PCA_Daten, nfactors = 4)
Standardized loadings (pattern matrix) based upon correla-tion matrix
RC1 RC2 RC3 RC4 h2 u2 com
1 0.91 -0.13 0.07 0.02 0.85 0.15 1.1
2 -0.34 -0.02 -0.72 -0.13 0.64 0.36 1.5
3 0.75 0.04 0.44 0.23 0.81 0.19 1.9
4 0.35 0.46 -0.03 -0.25 0.39 0.61 2.5
5 0.13 0.00 -0.27 0.80 0.73 0.27 1.3
6 -0.14 0.88 0.03 0.07 0.80 0.20 1.1
7 -0.02 0.88 -0.11 0.09 0.79 0.21 1.1
8 0.78 0.10 -0.44 0.01 0.80 0.20 1.6
9 -0.73 0.01 -0.01 -0.25 0.60 0.40 1.2
10 0.15 0.06 0.27 0.76 0.68 0.32 1.4
11 -0.23 -0.11 0.78 -0.13 0.69 0.31 1.3
RC1 RC2 RC3 RC4
SS loadings 2.87 1.80 1.67 1.44
Proportion Var 0.26 0.16 0.15 0.13
Cumulative Var 0.26 0.42 0.58 0.71
Proportion Explained 0.37 0.23 0.21 0.19
Cumulative Proportion 0.37 0.60 0.81 1.00
Mean item complexity = 1.4
Test of the hypothesis that 4 components are sufficient.
The root mean square of the residuals (RMSR) is 0.09
with the empirical chi square 1473.01 with prob < 3.2e-303
Fit based upon off diagonal values = 0.89

But I have to admit that I have not tried out what Julia now delivers as an output. So this is my quick answer…!

eliassno · June 26, 2020, 4:41pm

Fair enough. Each column of proj gives the corresponding PCA axis. I assumed you only wanted the first axis, since you didn’t provide the data you wanted to transform.

eliassno · June 26, 2020, 4:44pm

To visualize your data in 2D with PCA, you can transform your data X.

using PyCall, MultivariateStats, Plots

data = pyimport("sklearn.datasets")

# Generate labeled data with 3 classes
X, y = data.make_blobs(n_samples=1000, n_features = 10, centers = 3, cluster_std = 3, random_state=0)

# Columns correspond to observations
X = Array(X')

# Fit PCA model in 2d, for visualization
M = fit(PCA, X; maxoutdim=2)
             
# Transform data
x = transform(M, X)

# Plot dims 1 and 2 w/o PCA
p1 = scatter(X[1,:], X[2,:], c=y, xlabel="Dim. 1", ylabel="Dim. 2", title="Original")

# Plot dims 1 and 2 with PCA
p2 = scatter(x[1,:], x[2,:], xlabel="PC1", ylabel="PC2", c=y, title="PCA")

# Compare plots
plot(p1, p2, layout=2, legend=false)

pca

sylvaticus · June 26, 2020, 5:19pm

In BetaML you have what I believe is one of the simplest approach (disclaimer: I am the author):

“”’

pca(X;K,error)

Perform Principal Component Analysis returning the matrix reprojected among the dimensions of maximum variance.

Parameters:

X : The (N,D) data to reproject
K : The number of dimensions to maintain (with K<=D) [def: nothing]
error: The maximum approximation error that we are willing to tollerate [def: 0.05]

Return:

A named tuple with:
X: The reprojected (NxK) matrix
K: The dimensions retieved
error: The actual proportion of variance not explained in the reprojected dimensions
P: The (D,K) matrix of the eigenvectors associated to the K-largest eigenvalues used to reproject the data matrix

Note that if K is indicated, the parameter error has no effect.

“”"

colintbowers · June 29, 2020, 1:04am

If the way the package works is bothering you, it’s only a few lines of code to get them yourself using LinearAlgebra. For example, in most cases I’m familiar with, users want the components ordered from largest eignvalue to smallest. Given data matrix in x with rows as observations and columns as variables/dimensions:

    x = x .- mean(x, dims=1)     # Centre the data
    xcov = Symmetric(cov(x))    #Compute the covariance matrix
    ef = eigen(xcov)    #Get eigenvalues and eigenvectors
    eigvalvec = reverse(ef.values, dims=1)    #Orient eigenvalues largest to smallest
    eigvecmat = reverse(ef.vectors, dims=2)    #Orient eigenvectors the same
    eigvalcs = cumsum(eigvalvec)    #Cumulative sum of eigenvalues
    vratio = eigvalcs ./ eigvalcs[end]    #Ratios of variance explained
    pc = x * eigvecmat     #Principal components ordered largest eigenvalue to smallest

You didn’t actually ask for the variance ratios but again they are trivial to compute and often useful so I thought I’d include them.

Cheers,

Colin

Topic		Replies	Views
PCA Output? Statistics	25	6257	February 17, 2022
plot PCA projection: How interpret New to Julia question , package , plotting , pyplot , packages	5	1732	March 26, 2018
A question about PCA and visualization Performance question	1	349	June 26, 2020
Pca question Optimization (Mathematical)	2	700	June 22, 2021
Problem realization of PCA Optimization (Mathematical) question	2	555	February 4, 2022

A question ahout PCA

Parameters:

Return:

Related topics