Using a Neural Network to predict the parameters of a optimization Problem

If I understand well your problem, you have a multi-objective optimization problem of parameterized by \theta, and of the form :

\min_{y\in \mathcal{Y}} \sum_i \theta_i f_i(y)

and you want learn to predict \theta that minimizes the variance of (f_i(y))_i.

Here is below a minimal example of how you could do it using InferOpt.

Building the pipeline

First, import some packages:

using Flux
using HiGHS
using InferOpt
using JuMP
using LinearAlgebra: dot
using ProgressMeter
using Random
using Statistics: mean, std
using UnicodePlots

We define (and solve with JuMP) the following optimization problem with two objectives:

function knapsack_maximizer(θ::AbstractVector; instance)
    (; w, v₁, v₂, W) = instance
    n = length(w)

    model = Model(HiGHS.Optimizer)
    set_silent(model)
    @variable(model, x[1:n], Bin)

    @objective(
        model,
        Max,
        θ[1] * sum(v₁[i] * x[i] for i in 1:n) + θ[2] * sum(v₂[i] * x[i] for i in 1:n)
    )

    @constraint(model, sum(w[i] * x[i] for i in 1:n) <= W)

    optimize!(model)
    return value.(x)
end

Very simple instance generator for the multi objective knapsack problem:

function generate_instance(n, W)
    w = ones(n)
    v₁ = rand(1:5, n)
    v₂ = rand(6:10, n)
    return (; w, v₁, v₂, W)
end

We can now define our cost metric we want to minimize (here the standard deviation between objectives). You could replace it with any function you like more.

function cost(solution; instance)
    (; v₁, v₂) = instance
    return std([dot(v₁, solution), dot(v₂, solution)])
end

In order to predict weights \theta with a neural network, we need a way to build features from an instance. For this we’ll use the following simple features (they probably need to be more complex in order to achieve better results):

function features(instance)
    (; v₁, v₂) = instance
    return Float32[mean(v₁), mean(v₂)]
end

Now, we can define a neural network using Flux. We use a linear model for the example, but any differentiable model can be used instead:

Random.seed!(67)
ml_model = Chain(Dense(2 => 2; bias=false), softplus)

We now have all tools needed for building the complete pipeline:

julia> instance = generate_instance(5, 2)
(w = [1.0, 1.0, 1.0, 1.0, 1.0], v₁ = [1, 4, 3, 2, 1], v₂ = [7, 6, 10, 6, 6], W = 2)

julia> x = features(instance)
2-element Vector{Float32}:
 2.2
 7.0

julia> θ = ml_model(x)
2-element Vector{Float32}:
  8.577633
 10.12358

julia> sol = knapsack_maximizer(θ; instance)
5-element Vector{Float64}:
 -0.0
  1.0
  1.0
  0.0
  0.0

Making the pipeline differentiable

As you mentioned, gradient of the cost function w.r.t. cannot be computed because the optimization problem is not differentiable w.r.t. \theta:

julia> Flux.jacobian(θ -> knapsack_maximizer(θ; instance), θ)
ERROR: Compiling Tuple{typeof(MathOptInterface.add_variable), MathOptInterface.Utilities.CachingOptimizer{MathOptInterface.Bridges.LazyBridgeOptimizer{HiGHS.Optimizer}, MathOptInterface.Utilities.UniversalFallback{MathOptInterface.Utilities.Model{Float64}}}}: try/catch is not supported.
Refer to the Zygote documentation for fixes.
https://fluxml.ai/Zygote.jl/latest/limitations

This is where InferOpt enters the game. It provides options to wrap you optimizer and make it differentiable. For instance, a common wrapper is the PerturbedAdditive one, which perturbs the input \theta:

co_layer = PerturbedAdditive(knapsack_maximizer; ε=10.0, nb_samples=10)

You can apply it in a similar manner as your optimizer:

julia> co_layer(θ; instance)
5-element Vector{Float64}:
 0.5
 0.5
 0.8999999999999999
 0.0
 0.1

it outputs a continous solution, and is now differentiable:

julia> Flux.jacobian(θ -> co_layer(θ; instance), θ)[1]
5×2 Matrix{Float64}:
 -0.0421899    0.0444311
  0.0371633   -0.0197778
 -0.00502667   0.0246533
  0.0          0.0
  0.0          0.0

InferOpt also provides some loss functions to use with the combinatorial layers. In the case you don’t have any labeled solutions, you need to use a Pushforward as your loss:

loss = Pushforward(co_layer, cost)

Learning the pipeline

We can now learn the machine learning model weights as usual with Flux.

# Build train and test datasets
X_train = [generate_instance(100, 20) for _ in 1:10];
X_test = [generate_instance(100, 20) for _ in 1:100];

# Function to evaluate the average cost performance of a model on a given dataset
function evaluate_model(model, X)
    total_cost = 0.0
    for instance in X
        sol = knapsack_maximizer(model(features(instance)); instance)
        total_cost += cost(sol; instance)
    end
    return total_cost / length(X)
end
# Training loop
opt = Flux.Adam()
losses = Float64[]
test_costs = Float64[]
@showprogress for epoch in 1:100
    l = 0.0
    for instance in X_train
        x = features(instance)
        ps = Flux.params(ml_model)
        grads = gradient(ps) do
            θ = ml_model(x)
            loss_value = loss(ml_model(x); instance)
            l += loss_value
            return loss_value
        end
        Flux.update!(opt, ps, grads)
    end
    push!(losses, l)
    push!(test_costs, evaluate_model(ml_model, X_test))
end;

We can plot and visualize the training loss and test costs evolution and see we learned something :

julia> lineplot(losses)
       ┌────────────────────────────────────────┐ 
   800 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⢸⢰⡆⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⡸⡟⣧⡇⠀⠀⣧⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠁⠀⣿⢱⣄⣤⡿⣤⠀⢀⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⡿⠘⠛⠀⠀⠉⣷⠟⣴⡰⣿⡀⠀⠀⠀⠀⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠃⠀⠀⠀⠀⠀⠋⠀⣿⠃⡟⢿⣧⢀⠀⢠⣧⠀⢸⠀⢸⠀⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡟⠀⠁⠘⢻⠻⣀⣾⣿⣴⡟⡄⡜⣶⢀⡇⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠀⠈⢿⣿⠟⠃⣧⡇⠻⣼⢱⣷⢠⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠸⡿⠀⠀⠀⠀⠀⡿⢸⣿⢸⡇⢠⣆⡄⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠋⠇⢱⣼⡿⢸⣠⡠⣴⢣⠀⢠⡄⡀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣿⡇⠀⠁⠀⣿⢸⢀⡎⡇⣿│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠁⠀⠀⠀⡟⠈⣾⠀⣿⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠁⠀⠏⠀⠟⠀│ 
   400 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       └────────────────────────────────────────┘ 
       ⠀0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀100⠀ 

julia> lineplot(test_costs)
      ┌────────────────────────────────────────┐ 
   80 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠉⠉⠉⠙⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠉⠉⠉⠉⠉⠉⠉⠉⠉⢹⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠉⠉⠉⢹⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣇⣀⣀⣀⣀⣀⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣀⣀⣀⣀⡀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢱⣀⣀⣀⣀⣀⣀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
   40 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      └────────────────────────────────────────┘ 
      ⠀0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀100

This is only a minimal demo example, and can be finetuned to achieve better results.

4 Likes