# A question ahout PCA

After applying PCA algorithm, how to display its principal component?thanks!!

There have been quite a few discussions on this, use the search function on discourse to find e.g.

1 Like

You need the projection matrix, which you can get with `projection(M)`. You can read more about the properties in the docs.

``````using MultivariateStats
X = rand(4,5)
M = fit(PCA, X; maxoutdim=2)
proj = projection(M)[:, 1]
``````

thanks，but it does not tell me which is the extracted principal component?it only several numbers.

Because I want to plot these principal components

@abcde, I understand you, and I got into this a while back. I was not completely satisfied and so I addressed R about Julia. Here is just a code snippet:

``````using RCall
...
R"library(psych)"
# Estimate the number of HK's using the R function scree:
HK_R = R"scree(\$PCA_Daten)"

...

PCA_Modell = R"principal(\$PCA_Daten, \$PCA_HK)"
PCA_scores =  R"principal(\$PCA_Daten, \$PCA_HK)\$scores"
show(PCA_Modell)
PCA_scores_M = convert(Matrix, PCA_scores)
...
``````

And you will get something like this:

``````RObject{VecSxp}
Principal Components Analysis
Call: principal(r = `#JL`\$PCA_Daten, nfactors = 4)
RC1 RC2 RC3 RC4 h2 u2 com
1 0.91 -0.13 0.07 0.02 0.85 0.15 1.1
2 -0.34 -0.02 -0.72 -0.13 0.64 0.36 1.5
3 0.75 0.04 0.44 0.23 0.81 0.19 1.9
4 0.35 0.46 -0.03 -0.25 0.39 0.61 2.5
5 0.13 0.00 -0.27 0.80 0.73 0.27 1.3
6 -0.14 0.88 0.03 0.07 0.80 0.20 1.1
7 -0.02 0.88 -0.11 0.09 0.79 0.21 1.1
8 0.78 0.10 -0.44 0.01 0.80 0.20 1.6
9 -0.73 0.01 -0.01 -0.25 0.60 0.40 1.2
10 0.15 0.06 0.27 0.76 0.68 0.32 1.4
11 -0.23 -0.11 0.78 -0.13 0.69 0.31 1.3
RC1 RC2 RC3 RC4
Proportion Var 0.26 0.16 0.15 0.13
Cumulative Var 0.26 0.42 0.58 0.71
Proportion Explained 0.37 0.23 0.21 0.19
Cumulative Proportion 0.37 0.60 0.81 1.00
Mean item complexity = 1.4
Test of the hypothesis that 4 components are sufficient.
The root mean square of the residuals (RMSR) is 0.09
with the empirical chi square 1473.01 with prob < 3.2e-303
Fit based upon off diagonal values = 0.89
``````

But I have to admit that I have not tried out what Julia now delivers as an output. So this is my quick answer…! Fair enough. Each column of `proj` gives the corresponding PCA axis. I assumed you only wanted the first axis, since you didn’t provide the data you wanted to transform.

To visualize your data in 2D with PCA, you can `transform` your data `X`.

``````using PyCall, MultivariateStats, Plots

data = pyimport("sklearn.datasets")

# Generate labeled data with 3 classes
X, y = data.make_blobs(n_samples=1000, n_features = 10, centers = 3, cluster_std = 3, random_state=0)

# Columns correspond to observations
X = Array(X')

# Fit PCA model in 2d, for visualization
M = fit(PCA, X; maxoutdim=2)

# Transform data
x = transform(M, X)

# Plot dims 1 and 2 w/o PCA
p1 = scatter(X[1,:], X[2,:], c=y, xlabel="Dim. 1", ylabel="Dim. 2", title="Original")

# Plot dims 1 and 2 with PCA
p2 = scatter(x[1,:], x[2,:], xlabel="PC1", ylabel="PC2", c=y, title="PCA")

# Compare plots
plot(p1, p2, layout=2, legend=false)
`````` 1 Like

In BetaML you have what I believe is one of the simplest approach (disclaimer: I am the author):

“”’

pca(X;K,error)

Perform Principal Component Analysis returning the matrix reprojected among the dimensions of maximum variance.

# Parameters:

• `X` : The (N,D) data to reproject

• `K` : The number of dimensions to maintain (with K<=D) [def: `nothing`]

• `error`: The maximum approximation error that we are willing to tollerate [def: `0.05`]

# Return:

• A named tuple with:

• `X`: The reprojected (NxK) matrix

• `K`: The dimensions retieved

• `error`: The actual proportion of variance not explained in the reprojected dimensions

• `P`: The (D,K) matrix of the eigenvectors associated to the K-largest eigenvalues used to reproject the data matrix

Note that if `K` is indicated, the parameter `error` has no effect.

“”"

1 Like

If the way the package works is bothering you, it’s only a few lines of code to get them yourself using `LinearAlgebra`. For example, in most cases I’m familiar with, users want the components ordered from largest eignvalue to smallest. Given data matrix in `x` with rows as observations and columns as variables/dimensions:

``````    x = x .- mean(x, dims=1)     # Centre the data
xcov = Symmetric(cov(x))    #Compute the covariance matrix
ef = eigen(xcov)    #Get eigenvalues and eigenvectors
eigvalvec = reverse(ef.values, dims=1)    #Orient eigenvalues largest to smallest
eigvecmat = reverse(ef.vectors, dims=2)    #Orient eigenvectors the same
eigvalcs = cumsum(eigvalvec)    #Cumulative sum of eigenvalues
vratio = eigvalcs ./ eigvalcs[end]    #Ratios of variance explained
pc = x * eigvecmat     #Principal components ordered largest eigenvalue to smallest
``````

You didn’t actually ask for the variance ratios but again they are trivial to compute and often useful so I thought I’d include them.

Cheers,

Colin

2 Likes