If I understand well your problem, you have a multi-objective optimization problem of parameterized by \theta, and of the form :
\min_{y\in \mathcal{Y}} \sum_i \theta_i f_i(y)
and you want learn to predict \theta that minimizes the variance of (f_i(y))_i.
Here is below a minimal example of how you could do it using InferOpt.
Building the pipeline
First, import some packages:
using Flux
using HiGHS
using InferOpt
using JuMP
using LinearAlgebra: dot
using ProgressMeter
using Random
using Statistics: mean, std
using UnicodePlots
We define (and solve with JuMP) the following optimization problem with two objectives:
function knapsack_maximizer(θ::AbstractVector; instance)
(; w, v₁, v₂, W) = instance
n = length(w)
model = Model(HiGHS.Optimizer)
set_silent(model)
@variable(model, x[1:n], Bin)
@objective(
model,
Max,
θ[1] * sum(v₁[i] * x[i] for i in 1:n) + θ[2] * sum(v₂[i] * x[i] for i in 1:n)
)
@constraint(model, sum(w[i] * x[i] for i in 1:n) <= W)
optimize!(model)
return value.(x)
end
Very simple instance generator for the multi objective knapsack problem:
function generate_instance(n, W)
w = ones(n)
v₁ = rand(1:5, n)
v₂ = rand(6:10, n)
return (; w, v₁, v₂, W)
end
We can now define our cost metric we want to minimize (here the standard deviation between objectives). You could replace it with any function you like more.
function cost(solution; instance)
(; v₁, v₂) = instance
return std([dot(v₁, solution), dot(v₂, solution)])
end
In order to predict weights \theta with a neural network, we need a way to build features from an instance. For this we’ll use the following simple features (they probably need to be more complex in order to achieve better results):
function features(instance)
(; v₁, v₂) = instance
return Float32[mean(v₁), mean(v₂)]
end
Now, we can define a neural network using Flux. We use a linear model for the example, but any differentiable model can be used instead:
Random.seed!(67)
ml_model = Chain(Dense(2 => 2; bias=false), softplus)
We now have all tools needed for building the complete pipeline:
julia> instance = generate_instance(5, 2)
(w = [1.0, 1.0, 1.0, 1.0, 1.0], v₁ = [1, 4, 3, 2, 1], v₂ = [7, 6, 10, 6, 6], W = 2)
julia> x = features(instance)
2-element Vector{Float32}:
2.2
7.0
julia> θ = ml_model(x)
2-element Vector{Float32}:
8.577633
10.12358
julia> sol = knapsack_maximizer(θ; instance)
5-element Vector{Float64}:
-0.0
1.0
1.0
0.0
0.0
Making the pipeline differentiable
As you mentioned, gradient of the cost function w.r.t. cannot be computed because the optimization problem is not differentiable w.r.t. \theta:
julia> Flux.jacobian(θ -> knapsack_maximizer(θ; instance), θ)
ERROR: Compiling Tuple{typeof(MathOptInterface.add_variable), MathOptInterface.Utilities.CachingOptimizer{MathOptInterface.Bridges.LazyBridgeOptimizer{HiGHS.Optimizer}, MathOptInterface.Utilities.UniversalFallback{MathOptInterface.Utilities.Model{Float64}}}}: try/catch is not supported.
Refer to the Zygote documentation for fixes.
https://fluxml.ai/Zygote.jl/latest/limitations
This is where InferOpt enters the game. It provides options to wrap you optimizer and make it differentiable. For instance, a common wrapper is the PerturbedAdditive
one, which perturbs the input \theta:
co_layer = PerturbedAdditive(knapsack_maximizer; ε=10.0, nb_samples=10)
You can apply it in a similar manner as your optimizer:
julia> co_layer(θ; instance)
5-element Vector{Float64}:
0.5
0.5
0.8999999999999999
0.0
0.1
it outputs a continous solution, and is now differentiable:
julia> Flux.jacobian(θ -> co_layer(θ; instance), θ)[1]
5×2 Matrix{Float64}:
-0.0421899 0.0444311
0.0371633 -0.0197778
-0.00502667 0.0246533
0.0 0.0
0.0 0.0
InferOpt also provides some loss functions to use with the combinatorial layers. In the case you don’t have any labeled solutions, you need to use a Pushforward
as your loss:
loss = Pushforward(co_layer, cost)
Learning the pipeline
We can now learn the machine learning model weights as usual with Flux.
# Build train and test datasets
X_train = [generate_instance(100, 20) for _ in 1:10];
X_test = [generate_instance(100, 20) for _ in 1:100];
# Function to evaluate the average cost performance of a model on a given dataset
function evaluate_model(model, X)
total_cost = 0.0
for instance in X
sol = knapsack_maximizer(model(features(instance)); instance)
total_cost += cost(sol; instance)
end
return total_cost / length(X)
end
# Training loop
opt = Flux.Adam()
losses = Float64[]
test_costs = Float64[]
@showprogress for epoch in 1:100
l = 0.0
for instance in X_train
x = features(instance)
ps = Flux.params(ml_model)
grads = gradient(ps) do
θ = ml_model(x)
loss_value = loss(ml_model(x); instance)
l += loss_value
return loss_value
end
Flux.update!(opt, ps, grads)
end
push!(losses, l)
push!(test_costs, evaluate_model(ml_model, X_test))
end;
We can plot and visualize the training loss and test costs evolution and see we learned something :
julia> lineplot(losses)
┌────────────────────────────────────────┐
800 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⢸⢰⡆⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⡸⡟⣧⡇⠀⠀⣧⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠁⠀⣿⢱⣄⣤⡿⣤⠀⢀⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⡿⠘⠛⠀⠀⠉⣷⠟⣴⡰⣿⡀⠀⠀⠀⠀⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠃⠀⠀⠀⠀⠀⠋⠀⣿⠃⡟⢿⣧⢀⠀⢠⣧⠀⢸⠀⢸⠀⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡟⠀⠁⠘⢻⠻⣀⣾⣿⣴⡟⡄⡜⣶⢀⡇⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠀⠈⢿⣿⠟⠃⣧⡇⠻⣼⢱⣷⢠⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠸⡿⠀⠀⠀⠀⠀⡿⢸⣿⢸⡇⢠⣆⡄⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠋⠇⢱⣼⡿⢸⣠⡠⣴⢣⠀⢠⡄⡀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣿⡇⠀⠁⠀⣿⢸⢀⡎⡇⣿│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠁⠀⠀⠀⡟⠈⣾⠀⣿⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠁⠀⠏⠀⠟⠀│
400 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
└────────────────────────────────────────┘
⠀0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀100⠀
julia> lineplot(test_costs)
┌────────────────────────────────────────┐
80 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠉⠉⠉⠙⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠉⠉⠉⠉⠉⠉⠉⠉⠉⢹⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠉⠉⠉⢹⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣇⣀⣀⣀⣀⣀⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣀⣀⣀⣀⡀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢱⣀⣀⣀⣀⣀⣀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
40 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
└────────────────────────────────────────┘
⠀0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀100
This is only a minimal demo example, and can be finetuned to achieve better results.