Note that si≤1, and that si is close to 1 when the i-th point lies well within its own cluster. This property allows using mean(silhouettes(assignments, counts, X)) as a measure of clustering quality. Higher values indicate better separation of clusters w.r.t. point distances.
Elsewhere in the documentation, X is used to denote the data matrix. However, the following produces an error:
using Clustering
X = rand(5, 1000)
R = kmeans(X, 20, maxiter=200, display=:iter)
a = assignments(R)
c = counts(R)
mean(silhouettes(a, c, X))
julia> mean(silhouettes(a, c, X))
ERROR: DimensionMismatch("The size of a distance matrix ((5, 1000)) doesn't match the length of assignment vector (1000).")
Stacktrace:
[1] silhouettes(::Array{Int64,1}, ::Array{Int64,1}, ::Array{Float64,2}) at C:\Users\mthel\.julia\packages\Clustering\Dlx92\src\silhouette.jl:55
[2] top-level scope at REPL[35]:1
Is this a bug or is X supposed to be the pairwise distance matrix (as the docs for the silhouette function indicate)? If the latter is true, is there a recommended way to compute the pairwise distance matrix?
This seems to work fine, but I might suggest an update to the docs to make it a bit clearer. I’m also still interested in feedback regarding a “recommended” way to do this:
using Clustering
using Distances
using Statistics
X = rand(5, 1000)
R = kmeans(X, 30, maxiter=200, display=:iter)
a = assignments(R)
c = counts(R)
M = R.centers
distances = pairwise(SqEuclidean(), X)
julia> mean(silhouettes(a, c, distances))
0.2992654354073209
ERROR: MethodError: objects of type Vector{Float64} are not callable
Use square brackets [] for indexing an Array.
Stacktrace:
[1] top-level scope
@ c:\Users\Shayan\Desktop\AUT\Thesis\MyWork\Thesis.jl:261
And the line 261 is : mean(silhouettes(a, c, distances))