If I understand well your problem, you have a multi-objective optimization problem of parameterized by \theta, and of the form :

\min_{y\in \mathcal{Y}} \sum_i \theta_i f_i(y)

and you want learn to predict \theta that minimizes the variance of (f_i(y))_i.

Here is below a minimal example of how you could do it using InferOpt.

## Building the pipeline

First, import some packages:

```
using Flux
using HiGHS
using InferOpt
using JuMP
using LinearAlgebra: dot
using ProgressMeter
using Random
using Statistics: mean, std
using UnicodePlots
```

We define (and solve with JuMP) the following optimization problem with two objectives:

```
function knapsack_maximizer(θ::AbstractVector; instance)
(; w, v₁, v₂, W) = instance
n = length(w)
model = Model(HiGHS.Optimizer)
set_silent(model)
@variable(model, x[1:n], Bin)
@objective(
model,
Max,
θ[1] * sum(v₁[i] * x[i] for i in 1:n) + θ[2] * sum(v₂[i] * x[i] for i in 1:n)
)
@constraint(model, sum(w[i] * x[i] for i in 1:n) <= W)
optimize!(model)
return value.(x)
end
```

Very simple instance generator for the multi objective knapsack problem:

```
function generate_instance(n, W)
w = ones(n)
v₁ = rand(1:5, n)
v₂ = rand(6:10, n)
return (; w, v₁, v₂, W)
end
```

We can now define our cost metric we want to minimize (here the standard deviation between objectives). You could replace it with any function you like more.

```
function cost(solution; instance)
(; v₁, v₂) = instance
return std([dot(v₁, solution), dot(v₂, solution)])
end
```

In order to predict weights \theta with a neural network, we need a way to build features from an instance. For this we’ll use the following simple features (they probably need to be more complex in order to achieve better results):

```
function features(instance)
(; v₁, v₂) = instance
return Float32[mean(v₁), mean(v₂)]
end
```

Now, we can define a neural network using Flux. We use a linear model for the example, but any differentiable model can be used instead:

```
Random.seed!(67)
ml_model = Chain(Dense(2 => 2; bias=false), softplus)
```

We now have all tools needed for building the complete pipeline:

```
julia> instance = generate_instance(5, 2)
(w = [1.0, 1.0, 1.0, 1.0, 1.0], v₁ = [1, 4, 3, 2, 1], v₂ = [7, 6, 10, 6, 6], W = 2)
julia> x = features(instance)
2-element Vector{Float32}:
2.2
7.0
julia> θ = ml_model(x)
2-element Vector{Float32}:
8.577633
10.12358
julia> sol = knapsack_maximizer(θ; instance)
5-element Vector{Float64}:
-0.0
1.0
1.0
0.0
0.0
```

## Making the pipeline differentiable

As you mentioned, gradient of the cost function w.r.t. cannot be computed because the optimization problem is not differentiable w.r.t. \theta:

```
julia> Flux.jacobian(θ -> knapsack_maximizer(θ; instance), θ)
ERROR: Compiling Tuple{typeof(MathOptInterface.add_variable), MathOptInterface.Utilities.CachingOptimizer{MathOptInterface.Bridges.LazyBridgeOptimizer{HiGHS.Optimizer}, MathOptInterface.Utilities.UniversalFallback{MathOptInterface.Utilities.Model{Float64}}}}: try/catch is not supported.
Refer to the Zygote documentation for fixes.
https://fluxml.ai/Zygote.jl/latest/limitations
```

This is where InferOpt enters the game. It provides options to wrap you optimizer and make it differentiable. For instance, a common wrapper is the `PerturbedAdditive`

one, which perturbs the input \theta:

```
co_layer = PerturbedAdditive(knapsack_maximizer; ε=10.0, nb_samples=10)
```

You can apply it in a similar manner as your optimizer:

```
julia> co_layer(θ; instance)
5-element Vector{Float64}:
0.5
0.5
0.8999999999999999
0.0
0.1
```

it outputs a continous solution, and is now differentiable:

```
julia> Flux.jacobian(θ -> co_layer(θ; instance), θ)[1]
5×2 Matrix{Float64}:
-0.0421899 0.0444311
0.0371633 -0.0197778
-0.00502667 0.0246533
0.0 0.0
0.0 0.0
```

InferOpt also provides some loss functions to use with the combinatorial layers. In the case you don’t have any labeled solutions, you need to use a `Pushforward`

as your loss:

```
loss = Pushforward(co_layer, cost)
```

## Learning the pipeline

We can now learn the machine learning model weights as usual with Flux.

```
# Build train and test datasets
X_train = [generate_instance(100, 20) for _ in 1:10];
X_test = [generate_instance(100, 20) for _ in 1:100];
# Function to evaluate the average cost performance of a model on a given dataset
function evaluate_model(model, X)
total_cost = 0.0
for instance in X
sol = knapsack_maximizer(model(features(instance)); instance)
total_cost += cost(sol; instance)
end
return total_cost / length(X)
end
```

```
# Training loop
opt = Flux.Adam()
losses = Float64[]
test_costs = Float64[]
@showprogress for epoch in 1:100
l = 0.0
for instance in X_train
x = features(instance)
ps = Flux.params(ml_model)
grads = gradient(ps) do
θ = ml_model(x)
loss_value = loss(ml_model(x); instance)
l += loss_value
return loss_value
end
Flux.update!(opt, ps, grads)
end
push!(losses, l)
push!(test_costs, evaluate_model(ml_model, X_test))
end;
```

We can plot and visualize the training loss and test costs evolution and see we learned something :

```
julia> lineplot(losses)
┌────────────────────────────────────────┐
800 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⢸⢰⡆⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⡸⡟⣧⡇⠀⠀⣧⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠁⠀⣿⢱⣄⣤⡿⣤⠀⢀⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⡿⠘⠛⠀⠀⠉⣷⠟⣴⡰⣿⡀⠀⠀⠀⠀⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠃⠀⠀⠀⠀⠀⠋⠀⣿⠃⡟⢿⣧⢀⠀⢠⣧⠀⢸⠀⢸⠀⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡟⠀⠁⠘⢻⠻⣀⣾⣿⣴⡟⡄⡜⣶⢀⡇⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠀⠈⢿⣿⠟⠃⣧⡇⠻⣼⢱⣷⢠⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠸⡿⠀⠀⠀⠀⠀⡿⢸⣿⢸⡇⢠⣆⡄⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠋⠇⢱⣼⡿⢸⣠⡠⣴⢣⠀⢠⡄⡀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣿⡇⠀⠁⠀⣿⢸⢀⡎⡇⣿│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠁⠀⠀⠀⡟⠈⣾⠀⣿⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠁⠀⠏⠀⠟⠀│
400 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
└────────────────────────────────────────┘
⠀0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀100⠀
julia> lineplot(test_costs)
┌────────────────────────────────────────┐
80 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠉⠉⠉⠙⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠉⠉⠉⠉⠉⠉⠉⠉⠉⢹⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠉⠉⠉⢹⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣇⣀⣀⣀⣀⣀⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣀⣀⣀⣀⡀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢱⣀⣀⣀⣀⣀⣀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
│⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
40 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
└────────────────────────────────────────┘
⠀0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀100
```

This is only a minimal demo example, and can be finetuned to achieve better results.