K-Mean Clustering Issue

Ajaychat3 · October 3, 2018, 3:56am

I have trying to learn kmean clustering using the Clustering package and the example code given on “ClusteringSolutions” on the iris data. After reading CSV file, when I attempt to initiate kmean clustering using the example code given, I get the following error.

iris=CSV.read("iris.csv");
executed in 1.94s, finished 09:00:08 2018-10-03

features = Array(iris[:,[1,3,4]])'
result = kmeans( features, 3 )
executed in 48ms, finished 09:01:20 2018-10-03

┌ Warning: indexing with colon as row will create a copy in the future use df[col_inds] to get the columns without copying
│   caller = top-level scope at In[5]:1
└ @ Core In[5]:1
MethodError: no method matching Array(::DataFrames.DataFrame)
Closest candidates are:
  Array(!Matched::LinearAlgebra.SymTridiagonal) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\LinearAlgebra\src\tridiag.jl:111
  Array(!Matched::LinearAlgebra.Tridiagonal) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\LinearAlgebra\src\tridiag.jl:518
  Array(!Matched::LinearAlgebra.AbstractTriangular) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\LinearAlgebra\src\triangular.jl:106
  ...

Stacktrace:
 [1] top-level scope at In[5]:1

I understand the warning thrown by the process but the error thrown subsequetly is not understood. if I change the code a bit, I still get an error.

features=convert(Array{Float64},iris[1:4]);
features=features'
result = kmeans( features, 3 )
MethodError: no method matching kmeans(::LinearAlgebra.Adjoint{Float64,Array{Float64,2}}, ::Int64)
Closest candidates are:
  kmeans(!Matched::Array{T<:AbstractFloat,2}, ::Int64; weights, init, maxiter, tol, display, distance) where T<:AbstractFloat at C:\Users\chatura\.julia\packages\Clustering\jd204\src\kmeans.jl:51

Stacktrace:
 [1] top-level scope at In[13]:4

Finally when I modify the code as per instructions given in Clustering docs (K-means — Clustering 0.3.0 documentation), everything seem to work fine.

# features=convert(Array{Float64},iris[1:4]);
features=permutedims(convert(Array{Float64}, iris[1:4]), [2, 1])
result = kmeans(features, 4; maxiter=50, display=:iter)

  Iters               objv        objv-change | affected 
-------------------------------------------------------------
      0       1.228900e+02
      1       5.679215e+01      -6.609785e+01 |        3
      2       5.591592e+01      -8.762242e-01 |        3
      3       5.584415e+01      -7.177153e-02 |        2
      4       5.580442e+01      -3.973175e-02 |        0
      5       5.580442e+01       0.000000e+00 |        0
K-means converged with 5 iterations (objv = 55.80442051737124)
KmeansResult{Float64}([7.02692 5.01633 5.53214 6.27391; 3.1 3.44082 2.63571 2.88696; 5.94615 1.46735 3.96071 4.89348; 2.15 0.242857 1.22857 1.68043], [2, 2, 2, 2, 2, 2, 2, 2, 2, 2  …  1, 1, 4, 1, 1, 4, 4, 4, 4, 4], [0.0168763, 0.214223, 0.187897, 0.292387, 0.0319783, 0.436876, 0.182795, 0.00483549, 0.678713, 0.151162  …  0.289201, 0.754586, 0.350406, 0.0861243, 0.32997, 0.672146, 0.209972, 0.259972, 0.909102, 0.209537], [26, 49, 28, 46], [26.0, 49.0, 28.0, 46.0], 55.80442051737124, 5, true)

Although I am able to run the example, I look forward to understand the reasons for its failure in the first instance.

y4lu · October 3, 2018, 7:39am

Issue #2 is a bit of a quirk in the transpose operator, features = copy(features') will give it the normal Array datatype
Issue #1 could be fixed by defining the method for Array(::Dataframes)

Array(x::Dataframes.DataFrame) = convert(Array, x); ##T?

Ajaychat3 · October 3, 2018, 10:24am

using DataFrames
# features = Array(iris[:,[1,3,4]])'
Array(x::Dataframes.DataFrame) = convert(Array, x)
features=Array(iris[1:4]);
features=features'
result = kmeans(features, 4 )

UndefVarError: Dataframes not defined

Stacktrace:
 [1] top-level scope at In[8]:2

Ajaychat3 · October 3, 2018, 10:27am

This option works. Thanks

cormullion · October 3, 2018, 11:27am

It’s always an uppercase F in DataFrames.

Ajaychat3 · October 3, 2018, 12:00pm

@cormullion: My mistake. I just copied the code from the post above and did not check. I have done the correction but it still fails. It works with the commented out code.

using DataFrames
Array(x::DataFrames.DataFrame) = convert(Array, x)
features=Array(iris[1:4]);
features=features'

# features=convert(Array{Float64},iris[1:4]);
# features=copy(features')

result = kmeans(features, 4; maxiter=50, display=:iter )

MethodError: no method matching kmeans(::LinearAlgebra.Adjoint{Union{Missing, Float64},Array{Union{Missing, Float64},2}}, ::Int64; maxiter=50, display=:iter)
Closest candidates are:
  kmeans(!Matched::Array{T<:AbstractFloat,2}, ::Int64; weights, init, maxiter, tol, display, distance) where T<:AbstractFloat at C:\Users\chatura\.julia\packages\Clustering\jd204\src\kmeans.jl:51

Stacktrace:
 [1] top-level scope at In[11]:6

kdyrhage · October 3, 2018, 12:49pm

As you can see the adjoint features' returns a LinearAlgebra.Adjoint{Union{Missing, Float64},Array{Union{Missing, Float64},2}}, which kmeans doesn’t know how to handle. As @y4lu mentioned above you can use copy (or collect) to turn it back into an Array. The following works:

using RDatasets
using Clustering
iris = dataset("datasets", "iris")
features = collect(convert(Array, iris[[1,3,4]])')
result = kmeans(features, 3)

bkamins · October 3, 2018, 1:11pm

Use Matrix(df) instead of Array(df) as a conversion from a DataFrame to Matrix is defined.

EDIT: so you should write Matrix(iris[[1,3,4]]) and you will also not get a warning.

Topic		Replies	Views
Kmeans clustering using Clustering package Machine Learning	2	432	July 18, 2022
HELP: Implementing K-means from scratch with Julia General Usage question	4	1513	February 6, 2020
Error using kmean clustering New to Julia clustering	6	780	September 6, 2021
Cannot convert an object of type DataFrame to an object of type Array General Usage question , dataframes , convert	2	3715	June 16, 2021
Issue with DataFrame type dispatch General Usage package , dispatch , struct	3	480	April 3, 2021

K-Mean Clustering Issue

Related topics