As the title suggests, I’m trying to cluster a set of points in space into a given number of clusters based on their proximity, however I also want the clusters to be of roughly equal size based on the “weight” of each location that ends up in a cluster.
I’ve knocked up the simplest possible algorithm to do this based on an idea I found on CrossValidated here, and it appears to converge relatively quickly, but I’m wondering whether this is a known problem with a better solution.
From a pure Julia perspective, there seem to quite a lot of allocations so if anyone’s got suggestions to get those down that’d be welcome.
using LinearAlgebra, Random
# Create mock data
loc = 10*rand(150, 2)
dist = zeros(150, 150)
for i ∈ 1:150
x1, y1 = loc[i, :]
for j ∈ i:150
x2, y2 = loc[j, :]
dist[i, j] = √((x2 - x1)^2 + (y2 - y1)^2)
end
end
w = 5*rand(150)
w̄ = sum(w)/length(w)
# Objective function
function obj(ass, dist, w, w̄, cs; λ = 50.0)
# Cumulative distance between all points in cluster
ϵ = sum(sum(@view dist[ass .== i, ass .== i]) for i ∈ 1:cs)
# Deviation from average weight by cluster
σ = sum(sum(@view w[ass .== i]) for i ∈ 1:cs) .- w̄
# Objective is total distance within all clusters and scale of variation of weight across
sum(ϵ) + λ*norm(σ)
end
# Minimization algorithm
function get_clusters(dist, w, w̄, clusters::Integer)
ass = shuffle(
repeat(collect(1:clusters), Int(round(length(w)/clusters, RoundUp)))
)[1:length(w)]
ϵ = obj(ass, dist, w, w̄, clusters)
for t ∈ 1:5 # Seems to converge within 5 iterations in practice
for loc1 in 1:length(w)
for loc2 in loc1:length(w)
# perform swap
a1 = ass[loc1]; a2 = ass[loc2]
ass[loc1] = a2; ass[loc2] = a1
# Calculate assignment objective value
ϵ′ = obj(ass, dist, w, w̄, clusters)
# Keep swap if objective went down
if ϵ′ < ϵ
ϵ = ϵ′
else
ass[loc1] = a1; ass[loc2] = a2
end
end
end
@show ϵ
end
return ass
end
# Try it out with 5 clusters
clusters = 5
# Get new assignment into clusters
ass_new = get_clusters(dist_mat, w, w̄, clusters)