I was reading the Wikipedia page on k-medoids clustering and in the Software section, it’s stated that:
- Julia contains a k-medoid implementation of the k-means style algorithm (faster, but much worse result quality) in the JuliaStats/Clustering.jl package.
I don’t have experience doing k-medoids with anything other than Clustering.jl.
- Can anyone comment on this assertion?
- Do k-medoids implementations in R/Python produce “better quality” results?
- I haven’t really dug into the source code at Clustering.jl yet, so does anyone know if it implements the standard PAM algorithm (described here)?
If I get some time I might do some simple comparisons between the Clustering.jl implementation and whatever I can cook up with JuMP.jl. I think I should just be able to compare silhouette scores between the two results.
EDIT: they addressed this in the docs:
Note The function implements a K-means style algorithm instead of PAM (Partitioning Around Medoids). K-means style algorithm converges in fewer iterations, but was shown to produce worse (10-20% higher total costs) results (see e.g. Schubert & Rousseeuw (2019)).
If you don’t get a reply here, I think it is OK to ask in an issue at the package repo.
1 Like
I think in the Algorithms section of the Wikipedia page, the algorithm is mentioned. I am not that aware of the other algorithms, but I know my k-means,
so the mentioned Voronoi iteration is really similar to k-means in the sense of its steps.
The basic difference is of course that k-means minimises distances squared (which has as closed form solution the mean) and Voronoi iteration minimises distances to the center.
edit: One reason of course is that the optimization problem involved here (with distances) is non smooth.
Mayer that answers your first part.
I haven’t checked or compared other implementations/algorithms, but PAM would be nice to have for sure.
2 Likes
Found this, https://www.cs.umb.edu/cs738/pam1.pdf
Looks straight forward to implement.
3 Likes
Neat. Might be worth adding to Clustering.jl
.
An other imementation in Julia, altought pretty vanilla and not optimised:
https://github.com/sylvaticus/BetaML.jl/blob/master/src/Clustering.jl
(disclaimer, I am the author)
3 Likes
A parallel version is described here:
2 Likes
It appears that PAM hasn’t been implemented in any of the public packages in Julia. I’ve found a few resources that include pseudocode for PAM:
So…who wants to help me write it?! In all seriousness, I’ve been playing around with it for the past couple of hours but I could use some guidance from someone more experience in this realm. I think I’ll start a new post specifically about PAM in Julia to get a discussion going on that particular topic.
1 Like