A common technique to finding the optimal number for K in K-means clustering is to use the elbow method. You try different values of K, and plot the average euclidean distance from the points to their cluster centers, and find the location on the graph that resembles an “elbow”.
In Python there are some quick packages to implement this, and in Julia I’ve reached a point (no pun intended) where I have the elbow graphs all implemented. Here is an image of one.
How would you recommend I extract the X axis value from the elbow of this graph (in this case X=2) as the optimal number for K.
Thank you! Do you know of any other packages / functions within plots to do these calculations? I’m having a difficult time getting GMT up and running. I totally get the formula though makes a lot of sense, just want to know if there are any simpler implementations
Does anyone have good references for standard algorithms for this problem? It comes up a lot also when using PCA/SVD to identify a subspace in noisy data. I haven’t needed to automatically determine the dimensionality from the data (I often know it a priori) but it would be nice to have a good place to start looking.
Finding the point furthest from the line made with the first and last point misses the purpose of the elbow method. I would warn against using this for cases other than when you are just trying to the elbow point of two straight lines.
As per my limited understanding, the elbow method is about identifying the last number which leads to a significant (per subjective judgement) reduction in the sum of squared distances to the mean points. As you can see here, the author identifies 3 as the elbow point—not 2 which is at a sharper bend and furthest away from the above-mentioned line.
@Julia1 I am curious which Python packages you have used for this, and how their decision algorithms judge which point is the elbow.
If you had plotted my example data you would have seen that the 3rd point is obviously the furthest point. But, note that my response was only based on geometric question posted by the OP. And for that it works pretty well
Apologies for the confusion, by ‘the author’ I was referring to the author of the article which I linked to.
Indeed the code may work well for the present geometric question, I only wanted to caution against using this solution in determining the elbow point in the general case.