Hello, I am currently working on a problem in which I need to calculate multiple convex hulls for large datasets, as the following image shows:
Two things to note:
 I don’t need adjacency information, simply knowing the vertices is enough
 Obviously this example has two dimensions, but I’m working with machine learning datasets with higher dimensions
Originally, I was using the Ndimensional quick hull algorithm implemented in LazySets.jl, but recently I’ve come to try tackling this problem through linear programming.
For reference, here is my solution using the Quick Hull algorithm:
using Polyhedra, LazySets
function calculate_convex_hulls_qh(X)
N, D = size(X)
Xpoints = [X[i,:] for i in 1:N]
Xindices = [i for i in eachindex(Xpoints)]
n_hulls = 1
Xhullindices = Vector{Vector{Int64}}()
println("Calculating first hull...")
# first convex hull
hull = LazySets.convex_hull(Xpoints)
n_hulls += 1
# index mapping the points
hull_indices = [Xindices[findfirst(i > Xpoints[i] == hull[hi], Xindices)] for hi in eachindex(hull)]
push!(Xhullindices, hull_indices)
# remove already added indices from index list, create new list with remaining points
# original list of point is needed for the index mapping
Xindices = [i for i in Xindices if !(i in hull_indices)]
X_remaining = Xpoints[Xindices]
# while there are more remaining points than D
# at least D+1 points are necessary for the hulls
while length(Xindices) > D
# create another hull
println("Calculating hull $(n_hulls)...")
hull = LazySets.convex_hull(X_remaining)
# index map
hull_indices = [Xindices[findfirst(i > Xpoints[i] == hull[hi], Xindices)] for hi in eachindex(hull)]
push!(Xhullindices, hull_indices)
# remove the points from the remaining set and index set
Xindices = [i for i in Xindices if !(i in hull_indices)]
X_remaining = Xpoints[Xindices]
n_hulls += 1
end
# if there are any remaining points add them to the last set
if length(Xindices) > 0
println("Adding remaining points...")
append!(Xhullindices[end], Xindices)
end
# return indices of each hull and indices which are not vertices (inside last set)
return Xhullindices, Xindices
end
I’ve tried two LP solutions so far:
Solution 1
The first one is simply going through every point and seeing if it can be formed as the convex combination of every point in the set:
min_{\alpha \geq 0} \; \alpha_i
s.t.
\Sigma \alpha = 1
X^T\alpha = x_i
When checking the point x_i, I try to minimize its own weight \alpha_i in the convex combination. This way, if the objective value is above zero it must be a vertex.
Then, I simply go through every point, update the right hand side of the constraint with x_i and mark the vertices. This way, I have every vertex of the first hull.
After that, I fix the contributions \alpha_i of every found vertex at zero, check the inner points again, mark the vertices of the second hull, etc. until I have every hull.
Below is my implementation for this solution:
using JuMP, GLPK
function calculate_convex_hulls_lp_v1(X)
N, D = size(X)
# building initial model
model = JuMP.Model(GLPK.Optimizer)
set_silent(model)
# list of vector of indices for each hull (outer > inner)
hulls_idx_vector = Vector{Vector{Int64}}()
free_points = ones(N) # indices free = 1; indices that are in a hull = 0
# general convex combination variables
@variable(model, α[i=1:N] .>= 0)
@constraint(model, sum(α) == 1)
Xtp = permutedims(X)
@constraint(model, cc, Xtp*α .== view(X,1,:))
# until we can't make more hulls
n_hulls = 1
first_free_point = 1
while true
# going through points to make a hull
hull_indices = Vector{Int64}()
println("Hull $(n_hulls)...")
for i in first_free_point:N
# skipping points already in a hull
if (free_points[i] == 0) continue end
# set point constraint
set_normalized_rhs.(cc,view(X,i,:))
@objective(model, Min, α[i])
# solving point
JuMP.optimize!(model)
if (objective_value(model) > 0)
# point is vertex
push!(hull_indices, i)
free_points[i] = 0
end
end
# quit if there are too few points in this hull (no progress)
if length(hull_indices) <= D
println("No more hulls can be formed")
free_points[hull_indices] .= 1
break
end
# storing hull
push!(hulls_idx_vector, hull_indices)
n_hulls += 1
# updating first free point
first_free_point = findfirst(==(1), free_points)
# locking points in hull to 0 (can't be used)
fix.(α[hull_indices], 0; force=true)
# quit if there are no points left (it's over)
if isnothing(first_free_point) println("All point are in hulls"); break end
end
# create list of non vertices and add to last hull
non_vertex_points = [i for i in 1:N if free_points[i] == 1]
append!(hulls_idx_vector[end], non_vertex_points)
return hulls_idx_vector, non_vertex_points
end
Solution 2
The second solution is based on the same convex combination problem, but it attemps to deal with less variables by going through the points in the following way:
 Start with every convex combination variable \alpha fixed at 0
 Unfix the variable \alpha_i for the point i you’re going to check
 Solve the optimization problem
 If the objective value is above zero, mark the point i as ‘vertex candidate’
 If it is zero, fix the variable \alpha_i back to 0
This way we have a group of ‘vertex candidates’. Basically, we have definitely marked every vertex, but also some extra points that aren’t vertices (the first point that is checked, for example, definitely goes in). We can just go through the candidates again and recheck them to find the true vertices (only leaving unfixed all other vertex candidates).
Below is my implementation for this algorithm:
using JuMP, GLPK
function calculate_convex_hulls_lp_v2(X)
N, D = size(X)
# ranking points (to go through them in a 'smarter' way, outermost points are looked at first)
Xavg = vec(mean(X, dims=1))
Xdistances = [norm(X[i, :]  Xavg) for i in 1:size(X, 1)]
Xranked = sortperm(Xdistances;rev=true)
# building initial model
model = JuMP.Model(GLPK.Optimizer)
set_silent(model)
# list of vector of indices for each hull (outer > inner)
hulls_idx_vector = Vector{Vector{Int64}}()
free_points = ones(N) # indices free = 1; indices that are in a hull = 0
# general convex combination variables
# start with all of them locked at 0
@variable(model, α[i=1:N] .== 0)
@constraint(model, sum(α) == 1)
Xtp = permutedims(X)
@constraint(model, cc, Xtp*α .== view(X,1,:))
# until we can't make more hulls
n_hulls = 1
first_free_ranked_point = 1
while true
println("Hull $(n_hulls)...")
# first pass = going through points to get candidates
vertex_candidates = Vector{Int64}()
for i_rank in first_free_ranked_point:N
# skipping points already in a hull
i = Xranked[i_rank]
if (free_points[i] == 0) continue end
# set point constraint and objective
set_normalized_rhs.(cc,view(X,i,:))
@objective(model, Min, α[i])
# solving point
JuMP.optimize!(model)
if (!is_solved_and_feasible(model))
# point is vertex candidate
# free it up for use
push!(vertex_candidates, i)
unfix(α[i])
set_lower_bound(α[i], 0)
end
end
# second pass = go through candidates to find true vertices
hull_indices = Vector{Int64}()
for i in vertex_candidates
# set point constraint and objective
set_normalized_rhs.(cc,view(X,i,:))
@objective(model, Min, α[i])
# solving point
JuMP.optimize!(model)
if (objective_value(model) > 0)
# point is vertex
push!(hull_indices, i)
free_points[i] = 0
else
# point is not vertex (fix it)
fix(α[i], 0; force=true)
end
end
# quit if there are too few points in this hull (no progress)
if length(hull_indices) <= D
println("No more hulls can be formed")
free_points[hull_indices] .= 1
break
end
# storing hull
push!(hulls_idx_vector, hull_indices)
n_hulls += 1
# updating first free point
first_free_ranked_point = findfirst(i > free_points[i] == 1, Xranked)
# locking points in hull back to 0 (can't be used)
fix.(α[hull_indices], 0; force=true)
# quit if there are no points left (it's over)
if isnothing(first_free_ranked_point) println("All point are in hulls"); break end
end
# create list of non vertices and add to last hull
non_vertex_points = [i for i in 1:N if free_points[i] == 1]
append!(hulls_idx_vector[end], non_vertex_points)
return hulls_idx_vector, non_vertex_points
end
The question
Basically, I’m looking into ways of further optimizing this problem, since it can take quite a while to compute in large datasets.
I have benchmarked these solutions using the dataset UCI Online News Popularity and here’s some things I’ve already noted:

The second solution tends to win when there are a lot of points and not that many dimensions, but it gets really slow when the number of dimensions increases. Some reference times (N points, D dimensions):

N=2000, D=5: First solution took 4 s, second took 2.6 s (QuickHull = 3.1 s)

N=4000, D=5: First solution took 23.7 s, second took 13.2 s (QuickHull = 16.8 s)

N=2000, D=30: First solution took 15.4 s, second took 28.1 s (QuickHull = 20.5 seconds)

N=4000, D=30: First solution took 70.3 s, second took 114.9 s (QuickHull = 88.1 s)


GLPK beats HiGHS pretty consistently. This is curious to me since from what I’ve read around HiGHS is widely considered to be the better linear solver, so I wonder if some of what I’m using can be changed to get better results out of it. For reference, here are the results using HiGHS for the first solution:

N=2000, D=5: 9.4 s (GLPK = 4 s)

N=4000, D=5: 44.4 s (GLPK = 23.7 s)

N=2000, D=30: 17.7 s (GLPK = 15.4)

N=4000, D=30: 78.4 s (GLPK = 70.3 s)

While it certainly wasn’t that much of a precise benchmark and I’d take the exact times with a grain of salt, these patterns are something I’ve noticed while running this test quite a few times.
Anyway, I appreciate any suggestions of how to tune these current solutions (or any new solution that I could check out). Thanks!