Help setting up a problem with JuMP and/or Clustering.jl

Let’s say I want to divide all of the counties within a state into 4 groups and I want all of the counties in a group to be geographically contiguous. Furthermore, when I’m done, I want the total population of each of the 4 groups to be as close as possible.

I was thinking I could possibly use Clustering.jl to do the initial spatial clustering based on lat/lon of each county and then use JuMP to minimize the variance in population between groups but I don’t think that’s going to work. I’m now thinking it may be possible to simply use JuMP.jl to do this but I can’t figure out how to set the problem up.

Do the following objectives make sense?

  1. Minimize the mean (or some other statistic) of the norms of the pairwise Haversine distance matrices for each of the 4 groups (this should lead to groups having counties that are close together) and

  2. Minimize the standard deviation in the total populations of each of the 4 groups

If so, is this a multi-objective problem (requiring MultiJuMP.jl) or could I set it up to simply minimize the sum of the mean of the norms of the pairwise Haversine distance matrices for each of the 4 groups and the standard deviation in the total populations? Can one of the open-source solvers handle this?

I did something like this before, based on some code I stole off of StackOverflow:

# Function to compute great circle distance
function get_dist_ll(lat1, lon1, lat2, lon2; R = 6371.0)
    f = π/180.0
    Δlat = (lat2 - lat1) * f
    Δlon = (lon2 - lon1) * f
    a = sin(Δlat/2.0)^2 + cos(lat1 * f) * cos(lat2 * f) * sin(Δlon/2.0)^2

    R * 2atan(√(a), √(1.0 - a))
end

# Compute all cross-distances
dist_mat = zeros(size(df_t, 1), size(df_t, 1))

for i ∈ 1:size(df_t, 1)
    r1 = df_t[i, [:Latitude, :Longitude]]
    for j ∈ i:size(df_t, 1)
        r2 = df_t[j, [:Latitude, :Longitude]]
        dist_mat[i, j] = get_dist_ll(r1[1], r1[2], r2[1], r2[2])
    end
end

## CLUSTERING ANALYSIS

# Objective function
function obj(ass, dist, w, w̄, cs; λ = 50.0)
    # Cumulative distance between all points in cluster
    ϵ = sum(sum(@view dist[ass .== i, ass .== i]) for i ∈ 1:cs)
    # Deviation from average weight by cluster
    σ = sum(sum(@view w[ass .== i]) for i ∈ 1:cs) .- w̄

    # Objective is total distance within all clusters and scale of variation of weight across
    sum(ϵ) + λ*norm(σ)
end

# Minimization
function get_clusters(dist, w, w̄, clusters::Integer)

    n = length(w)
    ass = shuffle(repeat(collect(1:clusters), Int(round(n/clusters, RoundUp))))[1:n]

    ϵ = obj(ass, dist, w, w̄, clusters)

    for t ∈ 1:5 # Seems to converge within 5 iterations in practice
        for loc1 in 1:n
            for loc2 in loc1:n
                # perform swap
                a1 = ass[loc1]; a2 = ass[loc2]
                ass[loc1] = a2; ass[loc2] = a1

                # Calculate assignment objective value
                ϵ′ = obj(ass, dist, w, w̄, clusters)

                # Keep swap if objective went down
                if ϵ′ < ϵ
                    ϵ = ϵ′
                else
                    ass[loc1] = a1; ass[loc2] = a2
                end
            end
        end
    end

    return ass
end

# Choose number of desired clusters
clusters = Int(round(nrow(df_t)/8.5, RoundUp))

# Weight is the number of beds for now
w = Float64.(df_t[:, :beds)])
w̄ = sum(w)/clusters

# Assign hotels to clusters
ass_new = get_clusters(dist_mat, w, w̄, clusters)
df_t[!, :Cluster] = ass_new

Here λ governs the tradeoff between total distance and variance within cluster to make it a single objective function.

1 Like

Aside from the chosen abbreviation for the word assignment, this is absolutely fantastic :grinning:

Haha, I do vaguely remember renaming that variable after running into repeated errors when writing the function, never changed it back apparently…

:rofl:

This is really brilliant, I appreciate it.

I’ve refined this question here: Which JuMP.jl solver for this problem?

I still would like to figure out how to solve this within the JuMP.jl framework.