Using a Neural Network to predict the parameters of a optimization Problem

Mkloster · November 14, 2023, 1:49pm

Hello,

i’m trying to use a neural network to predict the best parameters for a optimization problem. For this first my modell should predict some parameters based of some information i have about my optimization problem. Then i use this information to solve my optimization problem. The results from my optimization problem can tell me now how good the prediction was.
I’m using Flux for the neural network and JuMP for my optimization problem.
The problem is how can i calculate the gradient and the loss function for such a problem, because i dont have any labels to compare the solution again ?

predict  = model(x)
results = solveOptimizationProblem(predict)

Thank you all for your time and help

BatyLeo · November 14, 2023, 2:19pm

Hello !

What your optimization problem looks like? Is it continuous or combinatorial?

InferOpt.jl might be useful depending on your needs, especially if your optimization problem is combinatorial.

Mkloster · November 14, 2023, 3:33pm

Hello,

I’m working with MILP Problems. I think InferOpt.jl is not the right tool for this.

Mkloster · November 14, 2023, 3:41pm

In the past i used diffrent metaheuristics for this task. Its like tuning the parameters of a blackbox.

gdalle · November 14, 2023, 6:04pm

On the contrary, InferOpt was designed to propagate derivatives through MILPs (or more precisely problems that can be formulated as MILPs, even if you don’t tackle them with a MILP solver). Disclaimer: @BatyLeo and I are the core developers

Can you tell us a bit more about:

what the neural network outputs
what the MILP looks like

Mkloster · November 15, 2023, 8:50am

I have a multicriteria MILP problem and i want to find the best weights for my problem so that all my objectives are balanced. For this i measure the variance between all the objective values and i want the variance to be 0.
To find these weights i want to use a neural network.

BatyLeo · November 15, 2023, 4:54pm

If I understand well your problem, you have a multi-objective optimization problem of parameterized by \theta, and of the form :

\min_{y\in \mathcal{Y}} \sum_i \theta_i f_i(y)

and you want learn to predict \theta that minimizes the variance of (f_i(y))_i.

Here is below a minimal example of how you could do it using InferOpt.

Building the pipeline

First, import some packages:

using Flux
using HiGHS
using InferOpt
using JuMP
using LinearAlgebra: dot
using ProgressMeter
using Random
using Statistics: mean, std
using UnicodePlots

We define (and solve with JuMP) the following optimization problem with two objectives:

function knapsack_maximizer(θ::AbstractVector; instance)
    (; w, v₁, v₂, W) = instance
    n = length(w)

    model = Model(HiGHS.Optimizer)
    set_silent(model)
    @variable(model, x[1:n], Bin)

    @objective(
        model,
        Max,
        θ[1] * sum(v₁[i] * x[i] for i in 1:n) + θ[2] * sum(v₂[i] * x[i] for i in 1:n)
    )

    @constraint(model, sum(w[i] * x[i] for i in 1:n) <= W)

    optimize!(model)
    return value.(x)
end

Very simple instance generator for the multi objective knapsack problem:

function generate_instance(n, W)
    w = ones(n)
    v₁ = rand(1:5, n)
    v₂ = rand(6:10, n)
    return (; w, v₁, v₂, W)
end

We can now define our cost metric we want to minimize (here the standard deviation between objectives). You could replace it with any function you like more.

function cost(solution; instance)
    (; v₁, v₂) = instance
    return std([dot(v₁, solution), dot(v₂, solution)])
end

In order to predict weights \theta with a neural network, we need a way to build features from an instance. For this we’ll use the following simple features (they probably need to be more complex in order to achieve better results):

function features(instance)
    (; v₁, v₂) = instance
    return Float32[mean(v₁), mean(v₂)]
end

Now, we can define a neural network using Flux. We use a linear model for the example, but any differentiable model can be used instead:

Random.seed!(67)
ml_model = Chain(Dense(2 => 2; bias=false), softplus)

We now have all tools needed for building the complete pipeline:

julia> instance = generate_instance(5, 2)
(w = [1.0, 1.0, 1.0, 1.0, 1.0], v₁ = [1, 4, 3, 2, 1], v₂ = [7, 6, 10, 6, 6], W = 2)

julia> x = features(instance)
2-element Vector{Float32}:
 2.2
 7.0

julia> θ = ml_model(x)
2-element Vector{Float32}:
  8.577633
 10.12358

julia> sol = knapsack_maximizer(θ; instance)
5-element Vector{Float64}:
 -0.0
  1.0
  1.0
  0.0
  0.0

Making the pipeline differentiable

As you mentioned, gradient of the cost function w.r.t. cannot be computed because the optimization problem is not differentiable w.r.t. \theta:

julia> Flux.jacobian(θ -> knapsack_maximizer(θ; instance), θ)
ERROR: Compiling Tuple{typeof(MathOptInterface.add_variable), MathOptInterface.Utilities.CachingOptimizer{MathOptInterface.Bridges.LazyBridgeOptimizer{HiGHS.Optimizer}, MathOptInterface.Utilities.UniversalFallback{MathOptInterface.Utilities.Model{Float64}}}}: try/catch is not supported.
Refer to the Zygote documentation for fixes.
https://fluxml.ai/Zygote.jl/latest/limitations

This is where InferOpt enters the game. It provides options to wrap you optimizer and make it differentiable. For instance, a common wrapper is the PerturbedAdditive one, which perturbs the input \theta:

co_layer = PerturbedAdditive(knapsack_maximizer; ε=10.0, nb_samples=10)

You can apply it in a similar manner as your optimizer:

julia> co_layer(θ; instance)
5-element Vector{Float64}:
 0.5
 0.5
 0.8999999999999999
 0.0
 0.1

it outputs a continous solution, and is now differentiable:

julia> Flux.jacobian(θ -> co_layer(θ; instance), θ)[1]
5×2 Matrix{Float64}:
 -0.0421899    0.0444311
  0.0371633   -0.0197778
 -0.00502667   0.0246533
  0.0          0.0
  0.0          0.0

InferOpt also provides some loss functions to use with the combinatorial layers. In the case you don’t have any labeled solutions, you need to use a Pushforward as your loss:

loss = Pushforward(co_layer, cost)

Learning the pipeline

We can now learn the machine learning model weights as usual with Flux.

# Build train and test datasets
X_train = [generate_instance(100, 20) for _ in 1:10];
X_test = [generate_instance(100, 20) for _ in 1:100];

# Function to evaluate the average cost performance of a model on a given dataset
function evaluate_model(model, X)
    total_cost = 0.0
    for instance in X
        sol = knapsack_maximizer(model(features(instance)); instance)
        total_cost += cost(sol; instance)
    end
    return total_cost / length(X)
end

# Training loop
opt = Flux.Adam()
losses = Float64[]
test_costs = Float64[]
@showprogress for epoch in 1:100
    l = 0.0
    for instance in X_train
        x = features(instance)
        ps = Flux.params(ml_model)
        grads = gradient(ps) do
            θ = ml_model(x)
            loss_value = loss(ml_model(x); instance)
            l += loss_value
            return loss_value
        end
        Flux.update!(opt, ps, grads)
    end
    push!(losses, l)
    push!(test_costs, evaluate_model(ml_model, X_test))
end;

We can plot and visualize the training loss and test costs evolution and see we learned something :

julia> lineplot(losses)
       ┌────────────────────────────────────────┐ 
   800 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⢸⢰⡆⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⡸⡟⣧⡇⠀⠀⣧⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠁⠀⣿⢱⣄⣤⡿⣤⠀⢀⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⡿⠘⠛⠀⠀⠉⣷⠟⣴⡰⣿⡀⠀⠀⠀⠀⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠃⠀⠀⠀⠀⠀⠋⠀⣿⠃⡟⢿⣧⢀⠀⢠⣧⠀⢸⠀⢸⠀⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡟⠀⠁⠘⢻⠻⣀⣾⣿⣴⡟⡄⡜⣶⢀⡇⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠀⠈⢿⣿⠟⠃⣧⡇⠻⣼⢱⣷⢠⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠸⡿⠀⠀⠀⠀⠀⡿⢸⣿⢸⡇⢠⣆⡄⠀⠀⠀⠀⠀⠀⠀⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠋⠇⢱⣼⡿⢸⣠⡠⣴⢣⠀⢠⡄⡀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣿⡇⠀⠁⠀⣿⢸⢀⡎⡇⣿│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠁⠀⠀⠀⡟⠈⣾⠀⣿⠀│ 
       │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠁⠀⠏⠀⠟⠀│ 
   400 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
       └────────────────────────────────────────┘ 
       ⠀0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀100⠀ 

julia> lineplot(test_costs)
      ┌────────────────────────────────────────┐ 
   80 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠉⠉⠉⠙⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠉⠉⠉⠉⠉⠉⠉⠉⠉⢹⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠉⠉⠉⢹⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣇⣀⣀⣀⣀⣀⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣀⣀⣀⣀⡀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢱⣀⣀⣀⣀⣀⣀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
   40 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      └────────────────────────────────────────┘ 
      ⠀0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀100

This is only a minimal demo example, and can be finetuned to achieve better results.

Topic		Replies	Views
Flux differentiation error Machine Learning zygote	19	1684	November 19, 2020
Maliar, Maliar, and Winant using Flux.jl (I just want to write a custom objective) Machine Learning question , flux , zygote	8	681	January 19, 2024
Parameters of the neural network not updating after training in a Neural ODE problem New to Julia sciml , reversediff , differentialequation	13	377	February 16, 2025
Parameter Estimation of a 3 node thermal network New to Julia	9	529	December 17, 2023
Question and error about optimization for Nueral ode Optimization (Mathematical) question	1	80	March 2, 2025

Using a Neural Network to predict the parameters of a optimization Problem

Building the pipeline

Making the pipeline differentiable

Learning the pipeline

Related topics