Clustering.jl Silhouettes Distances

The docs state:

Note that si≤1, and that si is close to 1 when the i-th point lies well within its own cluster. This property allows using mean(silhouettes(assignments, counts, X)) as a measure of clustering quality. Higher values indicate better separation of clusters w.r.t. point distances.

Elsewhere in the documentation, X is used to denote the data matrix. However, the following produces an error:

using Clustering

X = rand(5, 1000)
R = kmeans(X, 20, maxiter=200, display=:iter)

a = assignments(R)
c = counts(R)

mean(silhouettes(a, c, X))

julia> mean(silhouettes(a, c, X))
ERROR: DimensionMismatch("The size of a distance matrix ((5, 1000)) doesn't match the length of assignment vector (1000).")
Stacktrace:
 [1] silhouettes(::Array{Int64,1}, ::Array{Int64,1}, ::Array{Float64,2}) at C:\Users\mthel\.julia\packages\Clustering\Dlx92\src\silhouette.jl:55      
 [2] top-level scope at REPL[35]:1

Is this a bug or is X supposed to be the pairwise distance matrix (as the docs for the silhouette function indicate)? If the latter is true, is there a recommended way to compute the pairwise distance matrix?

This seems to work fine, but I might suggest an update to the docs to make it a bit clearer. I’m also still interested in feedback regarding a “recommended” way to do this:

using Clustering
using Distances
using Statistics

X = rand(5, 1000)
R = kmeans(X, 30, maxiter=200, display=:iter)

a = assignments(R)
c = counts(R)
M = R.centers
distances = pairwise(SqEuclidean(), X)

julia> mean(silhouettes(a, c, distances))
0.2992654354073209

@mthelm85 For me, it says:

ERROR: MethodError: objects of type Vector{Float64} are not callable
Use square brackets [] for indexing an Array.
Stacktrace:
 [1] top-level scope
   @ c:\Users\Shayan\Desktop\AUT\Thesis\MyWork\Thesis.jl:261

And the line 261 is : mean(silhouettes(a, c, distances))

Please don’t resurrect old threads, and make an MWE with your question as per this post.

It’s likely that you just re-bound mean (or silhouettes) to a number somewhere in your code:

julia> using Statistics

julia> mean = 5
5

julia> mean(rand(10))
ERROR: MethodError: objects of type Int64 are not callable

Matt’s code from the post above works just fine for me in a new session.

Ok, So probably I should open a new topic since the shape of my data is different?

I don’t think the error has anything to do with the shape of your data, but yes do open another topic where you produce an MWE, which means:

  1. Create a temp environment
  2. Add the packages you need
  3. Create some dummy data
  4. Run the code that produces the error you’re seeing

Then copy-paste the code from steps 3 and 4 into your new question.

Thanks for this great explanation. How can I create a temp env? What is a temp env, BTW?

And can you please check my question? I should know about that first; Then I can open a new topic for the latest issue.

In the REPL, hitting ] gets you into Pkg mode, then do activate --temp

1 Like

Thanks! And the problem solved! I had a variable named silhouettes, so Julia got confused. Thanks a lot.