Julia contains a k-medoid implementation of the k-means style algorithm (faster, but much worse result quality) in the JuliaStats/Clustering.jl package.
I don’t have experience doing k-medoids with anything other than Clustering.jl.
Can anyone comment on this assertion?
Do k-medoids implementations in R/Python produce “better quality” results?
I haven’t really dug into the source code at Clustering.jl yet, so does anyone know if it implements the standard PAM algorithm (described here)?
If I get some time I might do some simple comparisons between the Clustering.jl implementation and whatever I can cook up with JuMP.jl. I think I should just be able to compare silhouette scores between the two results.
EDIT: they addressed this in the docs:
Note The function implements a K-means style algorithm instead of PAM (Partitioning Around Medoids). K-means style algorithm converges in fewer iterations, but was shown to produce worse (10-20% higher total costs) results (see e.g. Schubert & Rousseeuw (2019)).
I think in the Algorithms section of the Wikipedia page, the algorithm is mentioned. I am not that aware of the other algorithms, but I know my k-means,
so the mentioned Voronoi iteration is really similar to k-means in the sense of its steps.
The basic difference is of course that k-means minimises distances squared (which has as closed form solution the mean) and Voronoi iteration minimises distances to the center.
edit: One reason of course is that the optimization problem involved here (with distances) is non smooth.
Mayer that answers your first part.
I haven’t checked or compared other implementations/algorithms, but PAM would be nice to have for sure.
So…who wants to help me write it?! In all seriousness, I’ve been playing around with it for the past couple of hours but I could use some guidance from someone more experience in this realm. I think I’ll start a new post specifically about PAM in Julia to get a discussion going on that particular topic.