I’m unable to find any package that implements this algorithm. I’m going to attempt to implement the algorithm myself but I’m not sure if I have the know-how to get it over the finish line, to be perfectly honest. Would anyone else be interested in working on this?
There are a couple of really good resources that discuss the algorithm and they even include pseudocode here:
- Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms
- A Parallel Architecture for the Partitioning Around Medoids (PAM) Algorithm for Scalable Multi-Core Processor Implementation with Applications in Healthcare
I fooled around with it for a couple of hours today and came up with this (mess) for the BUILD phase (I’m not sure that it’s right/works):
using Distances
X = rand(3, 5)
D = pairwise(Euclidean(), X, dims=2)
k = 2
N = size(D,1)
# BUILD
m = Int[]
while length(m) < k
td = Float64[]
for a in 1:N
medoids = Int[m...]
push!(medoids,a)
g = Float64[]
for i in 1:N
min_dist = minimum([D[i,medoid] for medoid in medoids])
push!(g, min_dist)
end
push!(td, sum(g))
end
push!(m, findfirst(x -> x == minimum(td), td))
end
# SWAP phase
If anyone is interested in working on this, let me know as I’ll be doing the same. Also, if someone successfully implements it before I do, please let me know!