Using a Neural Network to predict the parameters of a optimization Problem

BatyLeo · November 15, 2023, 4:54pm

If I understand well your problem, you have a multi-objective optimization problem of parameterized by \theta, and of the form :

\min_{y\in \mathcal{Y}} \sum_i \theta_i f_i(y)

and you want learn to predict \theta that minimizes the variance of (f_i(y))_i.

Here is below a minimal example of how you could do it using InferOpt.

Building the pipeline

First, import some packages:

using Flux
using HiGHS
using InferOpt
using JuMP
using LinearAlgebra: dot
using ProgressMeter
using Random
using Statistics: mean, std
using UnicodePlots

We define (and solve with JuMP) the following optimization problem with two objectives:

function knapsack_maximizer(θ::AbstractVector; instance)
    (; w, v₁, v₂, W) = instance
    n = length(w)

    model = Model(HiGHS.Optimizer)
    set_silent(model)
    @variable(model, x[1:n], Bin)

    @objective(
        model,
        Max,
        θ[1] * sum(v₁[i] * x[i] for i in 1:n) + θ[2] * sum(v₂[i] * x[i] for i in 1:n)
    )

    @constraint(model, sum(w[i] * x[i] for i in 1:n) <= W)

    optimize!(model)
    return value.(x)
end

Very simple instance generator for the multi objective knapsack problem:

function generate_instance(n, W)
    w = ones(n)
    v₁ = rand(1:5, n)
    v₂ = rand(6:10, n)
    return (; w, v₁, v₂, W)
end

We can now define our cost metric we want to minimize (here the standard deviation between objectives). You could replace it with any function you like more.

function cost(solution; instance)
    (; v₁, v₂) = instance
    return std([dot(v₁, solution), dot(v₂, solution)])
end

In order to predict weights \theta with a neural network, we need a way to build features from an instance. For this we’ll use the following simple features (they probably need to be more complex in order to achieve better results):

function features(instance)
    (; v₁, v₂) = instance
    return Float32[mean(v₁), mean(v₂)]
end

Now, we can define a neural network using Flux. We use a linear model for the example, but any differentiable model can be used instead:

Random.seed!(67)
ml_model = Chain(Dense(2 => 2; bias=false), softplus)

We now have all tools needed for building the complete pipeline:

julia> instance = generate_instance(5, 2)
(w = [1.0, 1.0, 1.0, 1.0, 1.0], v₁ = [1, 4, 3, 2, 1], v₂ = [7, 6, 10, 6, 6], W = 2)

julia> x = features(instance)
2-element Vector{Float32}:
 2.2
 7.0

julia> θ = ml_model(x)
2-element Vector{Float32}:
  8.577633
 10.12358

julia> sol = knapsack_maximizer(θ; instance)
5-element Vector{Float64}:
 -0.0
  1.0
  1.0
  0.0
  0.0

Making the pipeline differentiable

As you mentioned, gradient of the cost function w.r.t. cannot be computed because the optimization problem is not differentiable w.r.t. \theta:

julia> Flux.jacobian(θ -> knapsack_maximizer(θ; instance), θ)
ERROR: Compiling Tuple{typeof(MathOptInterface.add_variable), MathOptInterface.Utilities.CachingOptimizer{MathOptInterface.Bridges.LazyBridgeOptimizer{HiGHS.Optimizer}, MathOptInterface.Utilities.UniversalFallback{MathOptInterface.Utilities.Model{Float64}}}}: try/catch is not supported.
Refer to the Zygote documentation for fixes.
https://fluxml.ai/Zygote.jl/latest/limitations

This is where InferOpt enters the game. It provides options to wrap you optimizer and make it differentiable. For instance, a common wrapper is the PerturbedAdditive one, which perturbs the input \theta:

co_layer = PerturbedAdditive(knapsack_maximizer; ε=10.0, nb_samples=10)

You can apply it in a similar manner as your optimizer:

julia> co_layer(θ; instance)
5-element Vector{Float64}:
 0.5
 0.5
 0.8999999999999999
 0.0
 0.1

it outputs a continous solution, and is now differentiable:

julia> Flux.jacobian(θ -> co_layer(θ; instance), θ)[1]
5×2 Matrix{Float64}:
 -0.0421899    0.0444311
  0.0371633   -0.0197778
 -0.00502667   0.0246533
  0.0          0.0
  0.0          0.0

InferOpt also provides some loss functions to use with the combinatorial layers. In the case you don’t have any labeled solutions, you need to use a Pushforward as your loss:

loss = Pushforward(co_layer, cost)

Learning the pipeline

We can now learn the machine learning model weights as usual with Flux.

# Build train and test datasets
X_train = [generate_instance(100, 20) for _ in 1:10];
X_test = [generate_instance(100, 20) for _ in 1:100];

# Function to evaluate the average cost performance of a model on a given dataset
function evaluate_model(model, X)
    total_cost = 0.0
    for instance in X
        sol = knapsack_maximizer(model(features(instance)); instance)
        total_cost += cost(sol; instance)
    end
    return total_cost / length(X)
end

# Training loop
opt = Flux.Adam()
losses = Float64[]
test_costs = Float64[]
@showprogress for epoch in 1:100
    l = 0.0
    for instance in X_train
        x = features(instance)
        ps = Flux.params(ml_model)
        grads = gradient(ps) do
            θ = ml_model(x)
            loss_value = loss(ml_model(x); instance)
            l += loss_value
            return loss_value
        end
        Flux.update!(opt, ps, grads)
    end
    push!(losses, l)
    push!(test_costs, evaluate_model(ml_model, X_test))
end;

We can plot and visualize the training loss and test costs evolution and see we learned something :

julia> lineplot(losses)
       ┌────────────────────────────────────────┐ 
   800 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⢸⢰⡆⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⡸⡟⣧⡇⠀⠀⣧⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠁⠀⣿⢱⣄⣤⡿⣤⠀⢀⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⡿⠘⠛⠀⠀⠉⣷⠟⣴⡰⣿⡀⠀⠀⠀⠀⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠃⠀⠀⠀⠀⠀⠋⠀⣿⠃⡟⢿⣧⢀⠀⢠⣧⠀⢸⠀⢸⠀⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡟⠀⠁⠘⢻⠻⣀⣾⣿⣴⡟⡄⡜⣶⢀⡇⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠀⠈⢿⣿⠟⠃⣧⡇⠻⣼⢱⣷⢠⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠸⡿⠀⠀⠀⠀⠀⡿⢸⣿⢸⡇⢠⣆⡄⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠋⠇⢱⣼⡿⢸⣠⡠⣴⢣⠀⢠⡄⡀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣿⡇⠀⠁⠀⣿⢸⢀⡎⡇⣿│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠁⠀⠀⠀⡟⠈⣾⠀⣿⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠁⠀⠏⠀⠟⠀│ 
   400 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       └────────────────────────────────────────┘ 
       ⠀0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀100⠀ 

julia> lineplot(test_costs)
      ┌────────────────────────────────────────┐ 
   80 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠉⠉⠉⠙⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠉⠉⠉⠉⠉⠉⠉⠉⠉⢹⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠉⠉⠉⢹⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣇⣀⣀⣀⣀⣀⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣀⣀⣀⣀⡀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢱⣀⣀⣀⣀⣀⣀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
   40 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      └────────────────────────────────────────┘ 
      ⠀0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀100

This is only a minimal demo example, and can be finetuned to achieve better results.

Topic		Replies	Views
Flux differentiation error Machine Learning zygote	19	1776	November 19, 2020
Maliar, Maliar, and Winant using Flux.jl (I just want to write a custom objective) Machine Learning question , flux , zygote	8	764	January 19, 2024
Parameters of the neural network not updating after training in a Neural ODE problem New to Julia sciml , reversediff , differentialequation	13	450	February 16, 2025
Parameter Estimation of a 3 node thermal network New to Julia	9	610	December 17, 2023
Question and error about optimization for Nueral ode Optimization (Mathematical) question	1	95	March 2, 2025

Using a Neural Network to predict the parameters of a optimization Problem

Building the pipeline

Making the pipeline differentiable

Learning the pipeline

Related topics