# Problem realization of PCA

Hi everyone!

I’m new with Julia. I wanted to realise a PCA on Julia on the “iris” dataset, however I don’t understand why Julia doesn’t recognize my packages.

I installed every needed package. Could someone please help me and tell me if I need to do something else so my packages may be reconized?

This is what I did:

using Pkg
import Pkg;
using PlotlyJS, CSV, DataFrames, MLJ, Statistics, MultivariateStats
df = dataset(“datasets”, “iris”)
features = [:sepal_width, :sepal_length, :petal_width, :petal_length] # load and fit PCA
PCA = @load PCA pkg=“MultivariateStats”
mach = machine(PCA(pratio=1), df[!, features])
fit!(mach)

Thank you in advance!

I am not fully sure what you are asking for, but something like the code below will do a PCA. I’ve done some “off-line” editing so there may be bug or two, but this should otherwise work well on an `y` matrix

``````function PrincComp(y,Cov_y=[],Nvec=[])

n = size(y,2)         #no. of variables

if isempty(Cov_y)
Cov_y = cov(y)
end
isempty(Nvec) && (Nvec = 1:n)

F      = svd(Cov_y)                    #Cov_y = U*diagonal(S)*V', S[i,i] are in decreasing order
W      = F.U
lambda = F.S
for i = 1:size(W,2)
if all(W[:,i] .<= 0)                 #switch sign if all are negative
W[:,i] = -W[:,i]
end
end

relvar  = lambda/sum(lambda)
yDemean = y .- mean(y,dims=1)
pc      = yDemean*W           #the principal components, in descending order

lambda = lambda[Nvec]
relvar = relvar[Nvec]
W      = W[:,Nvec]
pc     = pc[:,Nvec]           #export a selected set of pcs

return pc,relvar,W,lambda

end

``````

You can use BetaMl pca…

It is as simple as :

``````julia> using Pkg;
julia> using DelimitedFiles, BetaML
julia> iris  = readdlm(joinpath(dirname(Base.find_package("BetaML")),"..","test","data","iris.csv"),',',skipstart=1) # load the data
150×5 Matrix{Any}:
5.1  3.5  1.4  0.2  "setosa"
4.9  3.0  1.4  0.2  "setosa"
4.7  3.2  1.3  0.2  "setosa"
4.6  3.1  1.5  0.2  "setosa"
5.0  3.6  1.4  0.2  "setosa"
5.4  3.9  1.7  0.4  "setosa"
4.6  3.4  1.4  0.3  "setosa"
5.0  3.4  1.5  0.2  "setosa"
4.4  2.9  1.4  0.2  "setosa"
⋮
6.9  3.1  5.1  2.3  "virginica"
5.8  2.7  5.1  1.9  "virginica"
6.8  3.2  5.9  2.3  "virginica"
6.7  3.3  5.7  2.5  "virginica"
6.7  3.0  5.2  2.3  "virginica"
6.3  2.5  5.0  1.9  "virginica"
6.5  3.0  5.2  2.0  "virginica"
6.2  3.4  5.4  2.3  "virginica"
5.9  3.0  5.1  1.8  "virginica"

julia> x = convert(Array{Float64,2},iris[:,1:4]) # select the features
150×4 Matrix{Float64}:
5.1  3.5  1.4  0.2
4.9  3.0  1.4  0.2
4.7  3.2  1.3  0.2
4.6  3.1  1.5  0.2
5.0  3.6  1.4  0.2
5.4  3.9  1.7  0.4
4.6  3.4  1.4  0.3
5.0  3.4  1.5  0.2
4.4  2.9  1.4  0.2
⋮
6.9  3.1  5.1  2.3
5.8  2.7  5.1  1.9
6.8  3.2  5.9  2.3
6.7  3.3  5.7  2.5
6.7  3.0  5.2  2.3
6.3  2.5  5.0  1.9
6.5  3.0  5.2  2.0
6.2  3.4  5.4  2.3
5.9  3.0  5.1  1.8

julia> pcaOut =  pca(x) # default
(X = [2.8182395066394617 -5.64634982341282; 2.7882234453146735 -5.1499513517629465; … ; 7.40330674682742 -5.443580535339713; 6.892553994556911 -5.044291638837153], K = 2, error = 0.022314793681205147, P = [0.3613865917853687 -0.6565887712868657; -0.08452251406457059 -0.7301614347850023; 0.8566706059498349 0.1733726627958613; 0.3582891971515499 0.07548101991748285], explVarByDim = [0.9246187232017274, 0.9776852063187949, 0.9947878161267246, 1.0])

julia> xReprojected = pcaOut.X
150×2 Matrix{Float64}:
2.81824  -5.64635
2.78822  -5.14995
2.61337  -5.182
2.75702  -5.00865
2.77365  -5.65371
3.22151  -6.06828
2.68183  -5.23749
2.87622  -5.49034
2.61598  -4.74864
⋮
7.42463  -5.73616
6.9176   -4.75204
8.06538  -5.60482
7.92111  -5.63175
7.44647  -5.51448
7.02953  -4.95164
7.26671  -5.40581
7.40331  -5.44358
6.89255  -5.04429

julia> explVarianceByDimensions = pcaOut.explVarByDim
4-element Vector{Float64}:
0.9246187232017274
0.9776852063187949
0.9947878161267246
1.0
``````

You can also specify how many dimensions you want to keep (e.g. `pcaOut = pca(x,K=3)`) or wich is the maximum error (unexplained variance) you are willing to accept (e.g. `pcaOut = pca(x,error=0.1)` ) in the projected matrix - the default is 0.05