A question about the "K" of kmeans

What methods can assist in determining the number of classes in a cluster?what Pkg?
thanks!!!

KMeans is an unsupervised method which means that your question is ill posed, in particular the fact that you talk about “classes” seems to suggest that you may be confusing clustering and classification.

For a clustering algorithm, you can assess the “quality” of the clustering using a number of metrics with a common one being the silhouette score but you should see clustering as a way to encode your data rather than a way to extract a classification rule out of it.

In terms of packages you might want to look at Clustering.jl or ParallelKMeans.jl .

4 Likes

If you google your question you may find some blog posts and tutorials (which are likely bot related to Julia but nevertheless may be of interest to you) that provide you with ideas and approaches on the selection of K

The most common way is to choose k manually by visually inspecting data. Another method to do it the the elbow method. The elbow method involves plotting the minimized cost of the algorithm with a range of k. The cost should decrease as you increase k. Initially, cost decreases quickly, and then at a specific k, it starts to decrease slower. That specific k can be a good candidate for the number of clusters. The elbow method is not always useful because you do not always get an “elbow” in your costs curve.

3 Likes