Problem realization of PCA

Sarah · February 4, 2022, 1:17pm

Hi everyone!

I’m new with Julia. I wanted to realise a PCA on Julia on the “iris” dataset, however I don’t understand why Julia doesn’t recognize my packages.

I installed every needed package. Could someone please help me and tell me if I need to do something else so my packages may be reconized?

This is what I did:

using Pkg
Pkg.add(“PlotlyJS”)
Pkg.add(“CSV”)
Pkg.add(“MLJ”)
Pkg.add(“Statistics”)
Pkg.add(“MultivariateStats”)
import Pkg;
Pkg.add(“MLJMultivariateStatsInterface”)
using PlotlyJS, CSV, DataFrames, MLJ, Statistics, MultivariateStats
df = dataset(“datasets”, “iris”)
features = [:sepal_width, :sepal_length, :petal_width, :petal_length] # load and fit PCA
PCA = @load PCA pkg=“MultivariateStats”
mach = machine(PCA(pratio=1), df[!, features])
fit!(mach)

Thank you in advance!

Paul_Soderlind · February 4, 2022, 7:47pm

I am not fully sure what you are asking for, but something like the code below will do a PCA. I’ve done some “off-line” editing so there may be bug or two, but this should otherwise work well on an y matrix

function PrincComp(y,Cov_y=[],Nvec=[])

  n = size(y,2)         #no. of variables

  if isempty(Cov_y)
    Cov_y = cov(y)
  end
  isempty(Nvec) && (Nvec = 1:n)

  F      = svd(Cov_y)                    #Cov_y = U*diagonal(S)*V', S[i,i] are in decreasing order
  W      = F.U
  lambda = F.S
  for i = 1:size(W,2)
    if all(W[:,i] .<= 0)                 #switch sign if all are negative
      W[:,i] = -W[:,i]
    end
  end


  relvar  = lambda/sum(lambda)
  yDemean = y .- mean(y,dims=1)
  pc      = yDemean*W           #the principal components, in descending order

  lambda = lambda[Nvec]
  relvar = relvar[Nvec]
  W      = W[:,Nvec]
  pc     = pc[:,Nvec]           #export a selected set of pcs

  return pc,relvar,W,lambda

end

sylvaticus · February 4, 2022, 9:59pm

You can use BetaMl pca…

It is as simple as :

julia> using Pkg;
julia> Pkg.add("BetaML");
julia> using DelimitedFiles, BetaML
julia> iris  = readdlm(joinpath(dirname(Base.find_package("BetaML")),"..","test","data","iris.csv"),',',skipstart=1) # load the data
150×5 Matrix{Any}:
 5.1  3.5  1.4  0.2  "setosa"
 4.9  3.0  1.4  0.2  "setosa"
 4.7  3.2  1.3  0.2  "setosa"
 4.6  3.1  1.5  0.2  "setosa"
 5.0  3.6  1.4  0.2  "setosa"
 5.4  3.9  1.7  0.4  "setosa"
 4.6  3.4  1.4  0.3  "setosa"
 5.0  3.4  1.5  0.2  "setosa"
 4.4  2.9  1.4  0.2  "setosa"
 ⋮                   
 6.9  3.1  5.1  2.3  "virginica"
 5.8  2.7  5.1  1.9  "virginica"
 6.8  3.2  5.9  2.3  "virginica"
 6.7  3.3  5.7  2.5  "virginica"
 6.7  3.0  5.2  2.3  "virginica"
 6.3  2.5  5.0  1.9  "virginica"
 6.5  3.0  5.2  2.0  "virginica"
 6.2  3.4  5.4  2.3  "virginica"
 5.9  3.0  5.1  1.8  "virginica"

julia> x = convert(Array{Float64,2},iris[:,1:4]) # select the features
150×4 Matrix{Float64}:
 5.1  3.5  1.4  0.2
 4.9  3.0  1.4  0.2
 4.7  3.2  1.3  0.2
 4.6  3.1  1.5  0.2
 5.0  3.6  1.4  0.2
 5.4  3.9  1.7  0.4
 4.6  3.4  1.4  0.3
 5.0  3.4  1.5  0.2
 4.4  2.9  1.4  0.2
 ⋮              
 6.9  3.1  5.1  2.3
 5.8  2.7  5.1  1.9
 6.8  3.2  5.9  2.3
 6.7  3.3  5.7  2.5
 6.7  3.0  5.2  2.3
 6.3  2.5  5.0  1.9
 6.5  3.0  5.2  2.0
 6.2  3.4  5.4  2.3
 5.9  3.0  5.1  1.8

julia> pcaOut =  pca(x) # default
(X = [2.8182395066394617 -5.64634982341282; 2.7882234453146735 -5.1499513517629465; … ; 7.40330674682742 -5.443580535339713; 6.892553994556911 -5.044291638837153], K = 2, error = 0.022314793681205147, P = [0.3613865917853687 -0.6565887712868657; -0.08452251406457059 -0.7301614347850023; 0.8566706059498349 0.1733726627958613; 0.3582891971515499 0.07548101991748285], explVarByDim = [0.9246187232017274, 0.9776852063187949, 0.9947878161267246, 1.0])

julia> xReprojected = pcaOut.X
150×2 Matrix{Float64}:
 2.81824  -5.64635
 2.78822  -5.14995
 2.61337  -5.182
 2.75702  -5.00865
 2.77365  -5.65371
 3.22151  -6.06828
 2.68183  -5.23749
 2.87622  -5.49034
 2.61598  -4.74864
 ⋮        
 7.42463  -5.73616
 6.9176   -4.75204
 8.06538  -5.60482
 7.92111  -5.63175
 7.44647  -5.51448
 7.02953  -4.95164
 7.26671  -5.40581
 7.40331  -5.44358
 6.89255  -5.04429

julia> explVarianceByDimensions = pcaOut.explVarByDim
4-element Vector{Float64}:
 0.9246187232017274
 0.9776852063187949
 0.9947878161267246
 1.0

You can also specify how many dimensions you want to keep (e.g. pcaOut = pca(x,K=3)) or wich is the maximum error (unexplained variance) you are willing to accept (e.g. pcaOut = pca(x,error=0.1) ) in the projected matrix - the default is 0.05

Topic		Replies	Views
Package for KPCA Statistics	3	498	April 29, 2023
PCA Output? Statistics	25	6257	February 17, 2022
PCA example from documentation -- not running General Usage question , package	6	2370	February 19, 2018
PCA in MultivariateStats Statistics	0	1535	November 14, 2016
Dimensionality Reduction Packages in Julia Optimization (Mathematical)	27	4980	October 24, 2021

Problem realization of PCA

Related topics