After applying PCA algorithm, how to display its principal component?thanks!!
There have been quite a few discussions on this, use the search function on discourse to find e.g.
This was already answered by @nilshg in the post mentioned above.
You need the projection matrix, which you can get with projection(M)
. You can read more about the properties in the docs.
using MultivariateStats
X = rand(4,5)
M = fit(PCA, X; maxoutdim=2)
proj = projection(M)[:, 1]
thanks，but it does not tell me which is the extracted principal component?it only several numbers.
Because I want to plot these principal components
@abcde, I understand you, and I got into this a while back. I was not completely satisfied and so I addressed R about Julia. Here is just a code snippet:
using RCall
...
R"library(psych)"
# Estimate the number of HK's using the R function scree:
HK_R = R"scree($PCA_Daten)"
...
PCA_Modell = R"principal($PCA_Daten, $PCA_HK)"
PCA_scores = R"principal($PCA_Daten, $PCA_HK)$scores"
show(PCA_Modell)
PCA_scores_M = convert(Matrix, PCA_scores)
...
And you will get something like this:
RObject{VecSxp}
Principal Components Analysis
Call: principal(r = `#JL`$PCA_Daten, nfactors = 4)
Standardized loadings (pattern matrix) based upon correlation matrix
RC1 RC2 RC3 RC4 h2 u2 com
1 0.91 0.13 0.07 0.02 0.85 0.15 1.1
2 0.34 0.02 0.72 0.13 0.64 0.36 1.5
3 0.75 0.04 0.44 0.23 0.81 0.19 1.9
4 0.35 0.46 0.03 0.25 0.39 0.61 2.5
5 0.13 0.00 0.27 0.80 0.73 0.27 1.3
6 0.14 0.88 0.03 0.07 0.80 0.20 1.1
7 0.02 0.88 0.11 0.09 0.79 0.21 1.1
8 0.78 0.10 0.44 0.01 0.80 0.20 1.6
9 0.73 0.01 0.01 0.25 0.60 0.40 1.2
10 0.15 0.06 0.27 0.76 0.68 0.32 1.4
11 0.23 0.11 0.78 0.13 0.69 0.31 1.3
RC1 RC2 RC3 RC4
SS loadings 2.87 1.80 1.67 1.44
Proportion Var 0.26 0.16 0.15 0.13
Cumulative Var 0.26 0.42 0.58 0.71
Proportion Explained 0.37 0.23 0.21 0.19
Cumulative Proportion 0.37 0.60 0.81 1.00
Mean item complexity = 1.4
Test of the hypothesis that 4 components are sufficient.
The root mean square of the residuals (RMSR) is 0.09
with the empirical chi square 1473.01 with prob < 3.2e303
Fit based upon off diagonal values = 0.89
But I have to admit that I have not tried out what Julia now delivers as an output. So this is my quick answer…!
Fair enough. Each column of proj
gives the corresponding PCA axis. I assumed you only wanted the first axis, since you didn’t provide the data you wanted to transform.
To visualize your data in 2D with PCA, you can transform
your data X
.
using PyCall, MultivariateStats, Plots
data = pyimport("sklearn.datasets")
# Generate labeled data with 3 classes
X, y = data.make_blobs(n_samples=1000, n_features = 10, centers = 3, cluster_std = 3, random_state=0)
# Columns correspond to observations
X = Array(X')
# Fit PCA model in 2d, for visualization
M = fit(PCA, X; maxoutdim=2)
# Transform data
x = transform(M, X)
# Plot dims 1 and 2 w/o PCA
p1 = scatter(X[1,:], X[2,:], c=y, xlabel="Dim. 1", ylabel="Dim. 2", title="Original")
# Plot dims 1 and 2 with PCA
p2 = scatter(x[1,:], x[2,:], xlabel="PC1", ylabel="PC2", c=y, title="PCA")
# Compare plots
plot(p1, p2, layout=2, legend=false)
In BetaML you have what I believe is one of the simplest approach (disclaimer: I am the author):
“”’
pca(X;K,error)
Perform Principal Component Analysis returning the matrix reprojected among the dimensions of maximum variance.
Parameters:

X
: The (N,D) data to reproject 
K
: The number of dimensions to maintain (with K<=D) [def:nothing
] 
error
: The maximum approximation error that we are willing to tollerate [def:0.05
]
Return:

A named tuple with:

X
: The reprojected (NxK) matrix 
K
: The dimensions retieved 
error
: The actual proportion of variance not explained in the reprojected dimensions 
P
: The (D,K) matrix of the eigenvectors associated to the Klargest eigenvalues used to reproject the data matrix
Note that if K
is indicated, the parameter error
has no effect.
“”"
If the way the package works is bothering you, it’s only a few lines of code to get them yourself using LinearAlgebra
. For example, in most cases I’m familiar with, users want the components ordered from largest eignvalue to smallest. Given data matrix in x
with rows as observations and columns as variables/dimensions:
x = x . mean(x, dims=1) # Centre the data
xcov = Symmetric(cov(x)) #Compute the covariance matrix
ef = eigen(xcov) #Get eigenvalues and eigenvectors
eigvalvec = reverse(ef.values, dims=1) #Orient eigenvalues largest to smallest
eigvecmat = reverse(ef.vectors, dims=2) #Orient eigenvectors the same
eigvalcs = cumsum(eigvalvec) #Cumulative sum of eigenvalues
vratio = eigvalcs ./ eigvalcs[end] #Ratios of variance explained
pc = x * eigvecmat #Principal components ordered largest eigenvalue to smallest
You didn’t actually ask for the variance ratios but again they are trivial to compute and often useful so I thought I’d include them.
Cheers,
Colin