I am building a problem that involves making choices on which of two levers (or both) to intervene on products in a portfolio to alter their characteristics in order to attain a target average “score” value derived from the weighted average of the existing scores of these products. I attach code at the bottom if its easier to parse than my explanation below:
Roughly, the data going in is one row per product with total volume, quantity, and revenue associated with that product. The interventions are to either alter the quantity sold of the product or to alter the score directly of the product. I set constraints on the total number of interventions of each type and the magnitude of these interventions to explore feasibility (these interventions in reality have costs associated which are approximate and so we run on a grid to explore across the uncertainty we have in what scale of intervention would be possible). There is a portfolio level constraint on total revenue (affected by the quantity shifts). The objective is a volume-weighted average of the scores associated with products and thus is non-linear, the formula is roughly:
Where is the quantity shift (as a proportion), and is a discrete improvement on the score.
In general the score can only be increased in steps and at maximum 3-5 steps (so I must have either a set of binaries or an integer variable defining it) whilst the quantity shift is continuous but bounded between a +/- range. I then use binary indicator variables to indicate and link whether an intervention should be done on a product; the sum of these is constrained to be less than some proportion of the products determined by a parameter on the function.
There are thousands of products in general; I end up with thousands of variables and products. I initially wanted to use Alpine.jl to solve this but realised it doesn’t allow division in the denominator so I have ended up trying out Juniper.jl (though I’d rather solve globally).
I have found that beyond ~2000 products the model just becomes incredibly slow and would appreciate any advice on improving speed and potentially making it solvable globally?
More generally, I am interested in ideally tweaking the “efficiency” of the model or somehow introducing stochasticity but it would suffice for now to just solve the problem on the full datasets I have (1000s of products).
using DataFrames
using JuMP
using HiGHS
using Ipopt
using Alpine
using Pavito
using Juniper
using AmplNLWriter
using AWSS3
using Parquet2
using MathOptInterface
using Couenne_jll
const MOI = MathOptInterface
const highs = optimizer_with_attributes(HiGHS.Optimizer, "log_to_console" => true)
const ipopt = optimizer_with_attributes(Ipopt.Optimizer, "sb" => "yes")
const pavito = optimizer_with_attributes(Pavito.Optimizer, "mip_solver" => highs, "cont_solver" => ipopt)
NPM_max::Int = 100
NPM_η::Int = 2
function optimise_npm(
df::DataFrame;
ρ_δ::Float64=0.1,
δ_max::Float64=0.05,
ρ_ΔNPM::Float64=0.1,
ΔNPM_max::Int64=4,
R_inc::Float64=0.001,
R_dec::Union{Float64,Nothing}=0.001,
Q_shift::Union{Float64,Nothing}=nothing,
target::Float64=69.0,
rel_gap::Float64=0.01,
)
# 1. Extract & scale input data
v = Float64.(df.total_volume)
spend = Float64.(df.total_spend)
q = Float64.(df.total_quantity)
NPM = Float64.(df.npm_score)
product_codes = df.product_code
n = length(product_codes)
reformulatable = findall(df.possible_to_reformulate .== 1)
non_reformulatable = setdiff(1:n, reformulatable)
K = [i in reformulatable ? round(Int, min(ΔNPM_max / NPM_η, (NPM_max - NPM[i]) / NPM_η)) : 0 for i in 1:n]
# Scale inputs to improve numerical stability
v_scaled = v ./ maximum(v)
spend_scaled = spend ./ maximum(spend)
q_scaled = q ./ maximum(q)
# 2. Instantiate Alpine MINLP model
# model = Model(
# optimizer_with_attributes(
# Alpine.Optimizer,
# "nlp_solver" => ipopt,
# "mip_solver" => highs,
# "minlp_solver" => pavito,
# "rel_gap" => rel_gap,
# # disable OBBT/presolve that creates quadratic constraints
# "presolve_bt" => false,
# "recognize_convex" => false,
# )
# )
# 2. Instantiate Couenne MINLP model
# model = Model(() -> AmplNLWriter.Optimizer(Couenne_jll.amplexe))
# # 2. Instantiate Pavito MINLP model
# model = Model(optimizer_with_attributes(
# Pavito.Optimizer,
# "mip_solver" => highs,
# "cont_solver" => ipopt,
# "rel_gap" => rel_gap,
# ))
# # 2. Instantiate Juniper MINLP model
model = Model(optimizer_with_attributes(
Juniper.Optimizer,
"nl_solver" => ipopt,
"mip_solver" => highs,
"mip_gap" => rel_gap,
))
# 3. Variables
@variable(model, I_δ[1:n], Bin)
@variable(model, -δ_max <= δ[1:n] <= δ_max)
@variable(model, I_Δ[i in reformulatable], Bin)
@variable(model, 0 <= ΔNPM_step[i in reformulatable] <= K[i], Int)
# 4. Constraints
# Limit total number of sales shifts
@constraint(model, sum(I_δ) <= floor(ρ_δ * n))
# Link δ and I_δ
@constraint(model, [i = 1:n], δ[i] <= δ_max * I_δ[i])
@constraint(model, [i = 1:n], δ[i] >= -δ_max * I_δ[i])
# Limit total number of reformulated products
@constraint(model, sum(I_Δ) <= floor(ρ_ΔNPM * length(reformulatable)))
# Link ΔNPM_step and I_Δ
@constraint(model, [i in reformulatable], ΔNPM_step[i] <= K[i] * I_Δ[i])
# Limit the net revenue impact to be within R_inc and (optionally) R_dec
total_revenue_change = @expression(model, sum(spend_scaled[i] * δ[i] for i in 1:n))
@constraint(model, total_revenue_change <= R_inc * sum(spend_scaled))
if R_dec !== nothing
@constraint(model, total_revenue_change >= -R_dec * sum(spend_scaled))
end
# Limit the net quantity impact to be within Q_shift
if Q_shift !== nothing
total_quantity_change = @expression(model, sum(q_scaled[i] * δ[i] for i in 1:n))
@constraint(model, total_quantity_change <= Q_shift * sum(q_scaled))
@constraint(model, total_quantity_change >= -Q_shift * sum(q_scaled))
end
# 5. Set start values
set_start_value.(δ, 0.0)
set_start_value.(I_δ, 0)
set_start_value.(ΔNPM_step, 0)
set_start_value.(I_Δ, 0)
# 6. Objective: Construct using @expression with explicit indexing
@expression(model, swa_new,
(
sum(v_scaled[i] * (1.0 + δ[i]) * NPM[i] for i in non_reformulatable)
+
sum(v_scaled[i] * (1.0 + δ[i]) * (NPM[i] + NPM_η * ΔNPM_step[i]) for i in reformulatable)
)
/
sum(v_scaled[i] * (1.0 + δ[i]) for i in 1:n)
)
@objective(model, Min, (swa_new - target)^2)
# 7. Solve
optimize!(model)
# 8. Extract solution & construct summary DataFrame
status = termination_status(model)
if status in [MOI.OPTIMAL, MOI.LOCALLY_SOLVED, MOI.TIME_LIMIT, MOI.OBJECTIVE_LIMIT]
I_δ_sol = value.(I_δ)
δ_sol = [value(I_δ[i]) > 0.5 ? value(δ[i]) : 0.0 for i in 1:n]
I_Δ_sol = zeros(length(product_codes))
ΔNPM_sol = zeros(length(product_codes))
for i in reformulatable
I_Δ_sol[i] = value(I_Δ[i])
ΔNPM_sol[i] = I_Δ_sol[i] > 0.5 ? NPM_η * value(ΔNPM_step[i]) : 0.0
end
df_out = DataFrame(
product_code=product_codes,
I_δ=I_δ_sol,
δ=δ_sol,
I_Δ=I_Δ_sol,
ΔNPM=ΔNPM_sol,
)
else
@warn "Solver did not converge (status = $status). Returning NaN-filled DataFrame."
df_out = DataFrame(
product_code=product_codes,
I_δ=fill(NaN, length(product_codes)),
δ=fill(NaN, length(product_codes)),
I_Δ=fill(NaN, length(product_codes)),
ΔNPM=fill(NaN, length(product_codes))
)
end
return (
results=df_out,
objective=sqrt(objective_value(model)),
status=status
)
end
Here is some code to generate synthetic inputs:
using Random
function generate_synthetic_data(n::Int64=1000; seed::Int64=42)
Random.seed!(seed)
product_code = [string("P", i) for i in 1:n]
unit_volume = rand(1:200, n) ./ 100
unit_spend = rand(50:500, n) ./ 100
total_quantity = rand(100:10000, n)
total_volume = unit_volume .* total_quantity
total_spend = unit_spend .* total_quantity
total_kcal = rand(10000.0:50000.0, n)
npm_score = 20 .+ (rand(10:40, n) .* 2)
possible_to_reformulate = rand(0:1, n)
return DataFrame(
(; product_code, unit_volume, unit_spend, total_quantity, total_volume, total_spend, total_kcal, npm_score, possible_to_reformulate)
)
end
df = generate_synthetic_data()
And a function to calculate swa_npm going in:
function swa_npm(df::DataFrame)
v = Float64.(df.total_volume)
NPM = Float64.(df.npm_score)
return sum(v .* NPM) / sum(v)
end
You should then make sure you set the target param on the model to be higher than the random input.