Regarding the label question. Take for example –
You have a data frame sized as (500,14)
First: Split half of the data to Training Set
Xtr = convert(Matrix, DF[ :, 1:13])'
Xtr_labels = convert(Vector, DF[:, 14])
Split half the data to test set
Xte = convert(Matrix, DF[2:2:end, 1:13])'
Xte_labels = convert(Vector, DF[2:end, 14])
Train a PCA model allowing 3 dimensions
M = fit(PCA, Xtr, maxoutdim = 3)
#Transform
Yte = transform(M, Xte)
Approximately reconstruct test observations
Xr = reconstruct(M, Yte)
From here if you wanted to assign labels of
interest (that might occupy random column position
from Yte (i.e., Yte[:, 2], Yte[:, 5], Yte[:, 8]) using
something like:
Cookies = Yte[:,Xte_labels.=="Cookies"]
Cereals = Yte[:,Xte_labels.=="Cereals"]
Pandas = Yte[:,Xte_labels.=="Pandas"]
How would you avoid an error than might
occur that reads:
BoundsError: attempt to access 3×253 Matrix{Float64} at index [1:3, Bool[0,0,...]]
# With stacktrace reading
throw_boundserror(::Matrix{Float64}, ::Tuple{Base.Slice{Base.OneTo{Int64}}, Base.LogicalIndex{Int64, BitVector}})@abstractarray.jl:651
checkbounds@abstractarray.jl:616[inlined]
_getindex(::IndexLinear, ::Matrix{Float64}, ::Base.Slice{Base.OneTo{Int64}}, ::Base.LogicalIndex{Int64, BitVector})@multidimensional.jl:831
getindex@abstractarray.jl:1170[inlined]
AND directly related to your suggestion to assign
“PC1, PC2…” label, how would you avoid a blank
plot that might result from the instructions below:
p = stp.scatter(Cookies[1,:], Cookies[2,:], Cookies[3,:] ,marker=:circle,linewidth=0)
stp.scatter!(Cereals[1,:], Cereals[2,:], Cereals[3,:] ,marker=:circle,linewidth=0)
stp.scatter!(Pandas[1,:], Pandas[2,:], Pandas[3,:] ,marker=:circle,linewidth=0)
plot!(p,
xlabel="PC1",
ylabel="PC2",
zlabel="PC3",
background= :grey,
color= :blue)