Help setting up a problem with JuMP and/or Clustering.jl

mthelm85 · July 17, 2020, 5:45pm

Let’s say I want to divide all of the counties within a state into 4 groups and I want all of the counties in a group to be geographically contiguous. Furthermore, when I’m done, I want the total population of each of the 4 groups to be as close as possible.

I was thinking I could possibly use Clustering.jl to do the initial spatial clustering based on lat/lon of each county and then use JuMP to minimize the variance in population between groups but I don’t think that’s going to work. I’m now thinking it may be possible to simply use JuMP.jl to do this but I can’t figure out how to set the problem up.

Do the following objectives make sense?

Minimize the mean (or some other statistic) of the norms of the pairwise Haversine distance matrices for each of the 4 groups (this should lead to groups having counties that are close together) and
Minimize the standard deviation in the total populations of each of the 4 groups

If so, is this a multi-objective problem (requiring MultiJuMP.jl) or could I set it up to simply minimize the sum of the mean of the norms of the pairwise Haversine distance matrices for each of the 4 groups and the standard deviation in the total populations? Can one of the open-source solvers handle this?

nilshg · July 17, 2020, 5:56pm

I did something like this before, based on some code I stole off of StackOverflow:

# Function to compute great circle distance
function get_dist_ll(lat1, lon1, lat2, lon2; R = 6371.0)
    f = π/180.0
    Δlat = (lat2 - lat1) * f
    Δlon = (lon2 - lon1) * f
    a = sin(Δlat/2.0)^2 + cos(lat1 * f) * cos(lat2 * f) * sin(Δlon/2.0)^2

    R * 2atan(√(a), √(1.0 - a))
end

# Compute all cross-distances
dist_mat = zeros(size(df_t, 1), size(df_t, 1))

for i ∈ 1:size(df_t, 1)
    r1 = df_t[i, [:Latitude, :Longitude]]
    for j ∈ i:size(df_t, 1)
        r2 = df_t[j, [:Latitude, :Longitude]]
        dist_mat[i, j] = get_dist_ll(r1[1], r1[2], r2[1], r2[2])
    end
end

## CLUSTERING ANALYSIS

# Objective function
function obj(ass, dist, w, w̄, cs; λ = 50.0)
    # Cumulative distance between all points in cluster
    ϵ = sum(sum(@view dist[ass .== i, ass .== i]) for i ∈ 1:cs)
    # Deviation from average weight by cluster
    σ = sum(sum(@view w[ass .== i]) for i ∈ 1:cs) .- w̄

    # Objective is total distance within all clusters and scale of variation of weight across
    sum(ϵ) + λ*norm(σ)
end

# Minimization
function get_clusters(dist, w, w̄, clusters::Integer)

    n = length(w)
    ass = shuffle(repeat(collect(1:clusters), Int(round(n/clusters, RoundUp))))[1:n]

    ϵ = obj(ass, dist, w, w̄, clusters)

    for t ∈ 1:5 # Seems to converge within 5 iterations in practice
        for loc1 in 1:n
            for loc2 in loc1:n
                # perform swap
                a1 = ass[loc1]; a2 = ass[loc2]
                ass[loc1] = a2; ass[loc2] = a1

                # Calculate assignment objective value
                ϵ′ = obj(ass, dist, w, w̄, clusters)

                # Keep swap if objective went down
                if ϵ′ < ϵ
                    ϵ = ϵ′
                else
                    ass[loc1] = a1; ass[loc2] = a2
                end
            end
        end
    end

    return ass
end

# Choose number of desired clusters
clusters = Int(round(nrow(df_t)/8.5, RoundUp))

# Weight is the number of beds for now
w = Float64.(df_t[:, :beds)])
w̄ = sum(w)/clusters

# Assign hotels to clusters
ass_new = get_clusters(dist_mat, w, w̄, clusters)
df_t[!, :Cluster] = ass_new

Here λ governs the tradeoff between total distance and variance within cluster to make it a single objective function.

mthelm85 · July 17, 2020, 6:12pm

Aside from the chosen abbreviation for the word assignment, this is absolutely fantastic

nilshg · July 17, 2020, 6:15pm

Haha, I do vaguely remember renaming that variable after running into repeated errors when writing the function, never changed it back apparently…

mthelm85 · July 17, 2020, 6:15pm

This is really brilliant, I appreciate it.

mthelm85 · July 19, 2020, 8:22pm

I’ve refined this question here: Which JuMP.jl solver for this problem?

I still would like to figure out how to solve this within the JuMP.jl framework.

Topic		Replies	Views
Which JuMP.jl solver for this problem? Optimization (Mathematical) jump , optimization	19	2814	July 27, 2020
JuMP Optimization Help Optimization (Mathematical) jump	5	1123	May 4, 2020
Solving multiple similar problems in JuMP Optimization (Mathematical) question , jump	8	609	September 19, 2023
Nested optimisation problem in JuMP Optimization (Mathematical) jump	6	644	January 12, 2024
JuMP kmeans with constraints Optimization (Mathematical) question , jump , optimization	4	304	November 8, 2022

Help setting up a problem with JuMP and/or Clustering.jl

Related topics