K-Medoids in Julia - Results Quality

mthelm85 · November 24, 2020, 2:33pm

I was reading the Wikipedia page on k-medoids clustering and in the Software section, it’s stated that:

Julia contains a k-medoid implementation of the k-means style algorithm (faster, but much worse result quality) in the JuliaStats/Clustering.jl package.

I don’t have experience doing k-medoids with anything other than Clustering.jl.

Can anyone comment on this assertion?
Do k-medoids implementations in R/Python produce “better quality” results?
I haven’t really dug into the source code at Clustering.jl yet, so does anyone know if it implements the standard PAM algorithm (described here)?

If I get some time I might do some simple comparisons between the Clustering.jl implementation and whatever I can cook up with JuMP.jl. I think I should just be able to compare silhouette scores between the two results.

EDIT: they addressed this in the docs:

Note The function implements a K-means style algorithm instead of PAM (Partitioning Around Medoids). K-means style algorithm converges in fewer iterations, but was shown to produce worse (10-20% higher total costs) results (see e.g. Schubert & Rousseeuw (2019)).

Tamas_Papp · November 25, 2020, 9:02am

If you don’t get a reply here, I think it is OK to ask in an issue at the package repo.

kellertuer · November 25, 2020, 12:29pm

I think in the Algorithms section of the Wikipedia page, the algorithm is mentioned. I am not that aware of the other algorithms, but I know my k-means,
so the mentioned Voronoi iteration is really similar to k-means in the sense of its steps.
The basic difference is of course that k-means minimises distances squared (which has as closed form solution the mean) and Voronoi iteration minimises distances to the center.

edit: One reason of course is that the optimization problem involved here (with distances) is non smooth.

Mayer that answers your first part.
I haven’t checked or compared other implementations/algorithms, but PAM would be nice to have for sure.

anon92994695 · November 25, 2020, 5:18pm

Found this, https://www.cs.umb.edu/cs738/pam1.pdf
Looks straight forward to implement.

kellertuer · November 25, 2020, 6:37pm

Neat. Might be worth adding to Clustering.jl.

sylvaticus · November 25, 2020, 9:05pm

An other imementation in Julia, altought pretty vanilla and not optimised:

https://github.com/sylvaticus/BetaML.jl/blob/master/src/Clustering.jl

(disclaimer, I am the author)

mthelm85 · November 25, 2020, 9:09pm

A parallel version is described here:

mthelm85 · November 27, 2020, 9:54pm

It appears that PAM hasn’t been implemented in any of the public packages in Julia. I’ve found a few resources that include pseudocode for PAM:

So…who wants to help me write it?! In all seriousness, I’ve been playing around with it for the past couple of hours but I could use some guidance from someone more experience in this realm. I think I’ll start a new post specifically about PAM in Julia to get a discussion going on that particular topic.

Topic		Replies	Views
Partitioning Around Medoids (PAM) in Julia Machine Learning clustering	10	1395	November 30, 2020
K-Medoids clustering in BetaML.jl Data question , package , clustering	10	670	December 20, 2022
[ANN] ParallelKMeans v1.0.0 - KMeans In Super Sonic Mode Package Announcements	16	1194	May 26, 2021
Optimization tips for my julia code. Can I make it even faster and/or memory efficient? Performance question , python	24	4212	February 15, 2020
Package for clustering data points Machine Learning question , clustering	8	699	June 23, 2022

K-Medoids in Julia - Results Quality

Related topics