K-Medoids in Julia - Results Quality

I was reading the Wikipedia page on k-medoids clustering and in the Software section, it’s stated that:

  • Julia contains a k-medoid implementation of the k-means style algorithm (faster, but much worse result quality) in the JuliaStats/Clustering.jl package.

I don’t have experience doing k-medoids with anything other than Clustering.jl.

  • Can anyone comment on this assertion?
  • Do k-medoids implementations in R/Python produce “better quality” results?
  • I haven’t really dug into the source code at Clustering.jl yet, so does anyone know if it implements the standard PAM algorithm (described here)?

If I get some time I might do some simple comparisons between the Clustering.jl implementation and whatever I can cook up with JuMP.jl. I think I should just be able to compare silhouette scores between the two results.

EDIT: they addressed this in the docs:

Note The function implements a K-means style algorithm instead of PAM (Partitioning Around Medoids). K-means style algorithm converges in fewer iterations, but was shown to produce worse (10-20% higher total costs) results (see e.g. Schubert & Rousseeuw (2019)).

If you don’t get a reply here, I think it is OK to ask in an issue at the package repo.

1 Like

I think in the Algorithms section of the Wikipedia page, the algorithm is mentioned. I am not that aware of the other algorithms, but I know my k-means,
so the mentioned Voronoi iteration is really similar to k-means in the sense of its steps.
The basic difference is of course that k-means minimises distances squared (which has as closed form solution the mean) and Voronoi iteration minimises distances to the center.

edit: One reason of course is that the optimization problem involved here (with distances) is non smooth.

Mayer that answers your first part.
I haven’t checked or compared other implementations/algorithms, but PAM would be nice to have for sure.

2 Likes

Found this, https://www.cs.umb.edu/cs738/pam1.pdf
Looks straight forward to implement.

3 Likes

Neat. Might be worth adding to Clustering.jl.

An other imementation in Julia, altought pretty vanilla and not optimised:

https://github.com/sylvaticus/BetaML.jl/blob/master/src/Clustering.jl

(disclaimer, I am the author)

3 Likes

A parallel version is described here:

2 Likes

It appears that PAM hasn’t been implemented in any of the public packages in Julia. I’ve found a few resources that include pseudocode for PAM:

So…who wants to help me write it?! :laughing: In all seriousness, I’ve been playing around with it for the past couple of hours but I could use some guidance from someone more experience in this realm. I think I’ll start a new post specifically about PAM in Julia to get a discussion going on that particular topic.

1 Like