Efficiently Retrieving Variable Values after Gurobi optimization with JuMP

Hello,

I’m solving a large number of LPs using Gurobi with JuMP, and I noticed that most of the computation time is spent retrieving variable values rather than optimizing the model. Is there a more efficient way to extract variable values after optimization?

Here’s a small example:

using JuMP
using Gurobi

function test(N)
    model = Model(Gurobi.Optimizer)
    set_silent(model)

    @variable(model, x[20:200]>=0);

    @objective(model, Min, sum(x))

    opt_time=0
    time_value=0
    for i in 1:N
        start=time()
        optimize!(model)
        opt_time+=time()-start
        start_value=time()
        a=value.(x)
        time_value+=time()-start_value
    end
    println("optimization time: ", opt_time)
    println("get solution time: ", time_value)
end

test(5000) gives the following output:
optimization time: 0.2440018653869629
get solution time: 1.5249981880187988

Is there a way to speed up the retrieval process?

Smart question. Maybe you can try the C API.

This may not be related to Gurobi, I tried it on my computer with HiGHS and retrieving variable values still takes up the major part of the time.

optimization time: 0.09199905395507812
get solution time: 0.5050008296966553

Hi @Mathis_3, welcome to the forum :smile:

The question is: should we expect that getting the solution time is faster?

You are solving the problem N times. But after solving the problem once, Gurobi recognizes that it has a stored solution that is already optimal, so the optimize!(model) calls for N > 1 are essentially free.

However, getting the value.(x) 5000 times is not free. At minimum, we have to create 5000 new vectors to store the solution, and there are 900,000 calls to the Gurobi C API to query the value of the variables. (JuMP doesn’t cache the value of the solution; we always query the solver.)

Do you have a more realistic example of your problem?

1 Like

Hi @odow,

To clarify, I am working within a stochastic programming framework where I solve a large number of LPs. I have a set of S scenarios, each requiring the solution of an LP with the same constraints but a different objective function, where the cost vector (T=24) varies by scenario. The number of variables also depends on a parameter N, typically ranging from 10 to 200.
Here is a small example:

using JuMP, Gurobi, Random

function test(N, T, S)

    cost=[rand(10:100, T) for s in 1:S] # Cost vector for scenario s
    model = Model(Gurobi.Optimizer)
    set_silent(model)
    @variable(model, x[1:N, 1:T]>=0);
    @variable(model, y[1:T]>=0);
    @constraint(model, [i in 1:N, t in 1:T-1], x[i,t+1]-x[i,t]==i-y[t])
    @objective(model, Min, 0)
    opt_time=0
    time_value=0
    for s in 1:S
        start=time()
        set_objective_coefficient(model, y, cost[s])
        optimize!(model)
        opt_time+=time()-start
        start_value=time()
        y_val=value.(y)
        x_val=value.(x)
        time_value+=time()-start_value
    end
    println("optimization time: ", opt_time)
    println("get solution time: ", time_value)

end

The output of test(50, 24, 50) is:

optimization time: 0.0709991455078125
get solution time: 0.20100092887878418

While the retrieval time seems short, this function is called a very large number of times, making it a bottleneck.

You can make things a bit faster by dropping some of the intermediate JuMP layers:

model = direct_model(Gurobi.Optimizer())

See Models · JuMP

But your example still has a problem. If you comment out the set_silent line, you’ll see that every solve is being completed in 0.0 seconds and 0 iterations because Gurobi is warm-starting the solve with an optimal basis from the previous iteration:

Gurobi Optimizer version 12.0.1 build v12.0.1rc0 (mac64[x86] - Darwin 24.1.0 24B83)

CPU model: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 1150 rows, 1224 columns and 3450 nonzeros
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [1e+01, 9e+01]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 5e+01]
LP warm-start: use basis

Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0    0.0000000e+00   0.000000e+00   0.000000e+00      0s

Solved in 0 iterations and 0.00 seconds (0.00 work units)
Optimal objective  0.000000000e+00

User-callback calls 20, time in user-callback 0.00 sec

It’s not that value is taking a long time. It’s that optimize! is very fast!

What is your stochastic programming algorithm? Do you need to query every variable in every scenario in every iteration? It’s very possible to solve very large stochastic programs using JuMP and Gurobi: http://sddp.dev

I am using Benders’ decomposition (L-shaped method) to solve a stochastic 2-stage Unit Commitment problem, where we manage a set of N generators to meet energy demand over T=24 time steps. At each iteration, I solve S subproblems (where S is the number of scenarios) to determine the optimal cut to add to the master problem. To construct these cuts, I need to retrieve specific variable values from each scenario at every iteration.

I give you a more complex example:

function test(S)

T=24
N=10
Pmin=[32.0, 37.0, 48.0, 48.0, 47.0, 50.0, 62.0, 55.0, 85.0, 84.0]
Pmax=[118.0, 119.0, 105.0, 113.0, 102.0, 190.0, 208.0, 211.0, 282.0, 323.0]
delta_up=[42.003808, 31.325072, 20.632301, 27.869072, 22.204262, 51.694535, 67.064482, 64.139573, 89.618227, 101.140357]
delta_down=[32.599073, 36.390287, 21.831385, 27.091832, 27.417263, 61.702851, 49.715649, 69.178653, 78.751405, 99.597226]
Demand=[953.170832, 902.233658, 736.221518, 730.912573, 743.969707, 801.50719, 958.479777, 1170.120146, 1366.551107, 1427.675715, 1434.849965, 1453.913382, 1428.536626, 1313.892113, 1354.641852, 1366.120652, 1360.237768, 1322.644699, 1270.70313, 1287.921329, 1323.649093, 1255.637205, 1141.566632, 1034.95728]

uncertainty=[randn(T)/100 for s in 1:S]

# model = Model(Gurobi.Optimizer)
model=direct_model(Gurobi.Optimizer())
set_silent(model)

@variable(model, μₘᵢₙ[i in 1:N, t in 1:T]>=0)
@variable(model, μₘₐₓ[i in 1:N, t in 1:T]>=0)
@variable(model, μꜛ[i in 1:N, t in 1:T]>=0)
@variable(model, μꜜ[i in 1:N, t in 1:T]>=0)
@variable(model, ν[t in 1:T])

@constraint(model,  [i in 1:N, t in 1:T-1], μₘᵢₙ[i, t]-μₘₐₓ[i, t]-μꜛ[i,t]+μꜛ[i,t+1]+μꜜ[i,t]-μꜜ[i,t+1]+ν[t]==10)
@constraint(model,  [i in 1:N], -μꜜ[i, 1]+μꜛ[i,1]==0)
@constraint(model,  [i in 1:N], μₘᵢₙ[i, T]-μₘₐₓ[i, T]-μꜛ[i,T]+μꜜ[i,T]+ν[T]==10)

@constraint(model,  [t in 1:T], ν[t]<=70)
@constraint(model,  [t in 1:T], ν[t]>=-70)

@objective(model, Max, sum(μₘᵢₙ[i,t]*Pmin[i] - μₘₐₓ[i,t]*Pmax[i]-μꜛ[i,t]*delta_up[i]-μꜜ[i,t]*delta_down[i] for i in 1:N for t in 1:T))

opt_time=0
time_value=0
for s in 1:S
    start=time()
    set_objective_coefficient(model,ν, [Demand[t]*(1+uncertainty[s][t]) for t in 1:T])
    optimize!(model)
    opt_time+=time()-start

    start_value=time()
    μꜛₖ=JuMP.value.(μꜛ)
    μꜜₖ=JuMP.value.(μꜜ)
    νₖ=JuMP.value.(ν)
    time_value+=time()-start_value
end

println("optimization time: ", opt_time)
println("get solution time: ", time_value)

end

The output of test(200) is:

optimization time: 0.19800114631652832
get solution time: 0.18399882316589355

Since my Benders’ decomposition requires around 100 iterations to converge, retrieving variable values accounts for around 20 seconds (25% of the total runtime), which is quite significant.

To construct these cuts, I need to retrieve specific variable values from each scenario at every iteration

Are your state variables μꜛ, μꜜ, and ν? Why are you querying their value in the subproblem? Don’t you need the dual? Or have you formulated the dual directly?

Have you seen this tutorial: Benders decomposition · JuMP?

It’s much simpler when you formulate the second stage as the primal.

It might help if you could provide the actual code you are using for your Benders. (Or provide the first-stage problem as well, and I can show you how to implement in SDDP.jl)

If you are solving a problem via Benders in which querying the variable value is a bottleneck, then this really suggests that you should not be using Benders. Have you tried to solve the deterministic equivalent directly?

Here is my full implementation of Benders decomposition on a toy example:

using JuMP, Gurobi, Random

function test(S, Maxiter)

T=24
N=10
cost_1=[4.3169962, 4.5086696, 4.0460142, 4.9007914, 4.554301799999999, 6.3083598, 5.9217375, 6.4922892, 6.1188277, 5.4346844]
Pmin=[32.0, 37.0, 48.0, 48.0, 47.0, 50.0, 62.0, 55.0, 85.0, 84.0]
Pmax=[118.0, 119.0, 105.0, 113.0, 102.0, 190.0, 208.0, 211.0, 282.0, 323.0]
delta_up=[42.003808, 31.325072, 20.632301, 27.869072, 22.204262, 51.694535, 67.064482, 64.139573, 89.618227, 101.140357]
delta_down=[32.599073, 36.390287, 21.831385, 27.091832, 27.417263, 61.702851, 49.715649, 69.178653, 78.751405, 99.597226]
Demand=[953.170832, 902.233658, 736.221518, 730.912573, 743.969707, 801.50719, 958.479777, 1170.120146, 1366.551107, 1427.675715, 1434.849965, 1453.913382, 1428.536626, 1313.892113, 1354.641852, 1366.120652, 1360.237768, 1322.644699, 1270.70313, 1287.921329, 1323.649093, 1255.637205, 1141.566632, 1034.95728]
uncertainty=[randn(T)/100 for s in 1:S]

# master problem
master = Model(Gurobi.Optimizer)
set_silent(master)
@variable(master, is_on[i in 1:N, t in 1:T],Bin)
@variable(master, z[s in 1:S]>=0)
@objective(master, Min, sum(z[s] for s in 1:S)/S+sum(is_on[i,t]*cost_1[i] for i in 1:N for t in 1:T)) # second stage cost + first stage cost

# subproblem
subproblem = Model(Gurobi.Optimizer)
set_silent(subproblem)

@variable(subproblem, μₘᵢₙ[i in 1:N, t in 1:T]>=0)
@variable(subproblem, μₘₐₓ[i in 1:N, t in 1:T]>=0)
@variable(subproblem, ν[t in 1:T])
@constraint(subproblem,  [i in 1:N, t in 1:T-1], μₘᵢₙ[i, t]-μₘₐₓ[i, t]+ν[t]==10)
@constraint(subproblem,  [t in 1:T], ν[t]<=70)
@constraint(subproblem,  [t in 1:T], ν[t]>=-70)

LB=1
UB=1e9
k=0
while k<=Maxiter && 100*(UB-LB)/LB>1 # convergence at 1%
    optimize!(master)
    k+=1
    LB=objective_value(master)
    is_onₖ=JuMP.value.(is_on)
    z_val=JuMP.value.(z)


    @objective(subproblem, Max, sum(μₘᵢₙ[i,t]*Pmin[i]*is_onₖ[i,t] - μₘₐₓ[i,t]*Pmax[i]*is_onₖ[i,t] for i in 1:N for t in 1:T))

    mumin=[zeros(N,T) for s in 1:S]
    mumax=[zeros(N,T) for s in 1:S]
    nu=[zeros(T) for s in 1:S]
    cost=zeros(S)
    for s in 1:S
        set_objective_coefficient(subproblem,ν, [Demand[t]*(1+uncertainty[s][t]) for t in 1:T])
        optimize!(subproblem)
        mumin[s]=JuMP.value.(μₘᵢₙ)
        mumax[s]=JuMP.value.(μₘₐₓ)
        nu[s]=JuMP.value.(ν)
        cost[s]=objective_value(subproblem)
    end
    UB=min(UB,sum(cost)/S+sum(is_onₖ[i,t]*cost_1[i] for i in 1:N for t in 1:T))
    if sum(cost)/S>1.001*sum(z_val)/S
        @constraint(master, [s in 1:S], z[s]>= sum(mumin[s][i,t]*Pmin[i]*is_on[i,t] - mumax[s][i,t]*Pmax[i]*is_on[i,t] for i in 1:N for t in 1:T)+sum(nu[s][t]*Demand[t]*(1+uncertainty[s][t]) for t in 1:T))
    end
end

end

In this implementation, I solve the dual problem directly. The corresponding primal formulation is:

primal = Model(Gurobi.Optimizer)
set_silent(subproblem)
@variable(primal, power[i in 1:N, t in 1:T]>=0)
@variable(primal, power_shedding[t in 1:T]>=0)
@variable(primal, power_curtailment[t in 1:T]>=0)
@constraint(subproblem, [i in 1:N, t in 1:T-1], power[i, t]<=Pmax[i])
@constraint(subproblem, [i in 1:N, t in 1:T-1], power[i, t]>=Pmin[i])
@constraint(subproblem, [t in 1:T], sum(power[i,t] for i in 1:N)+power_shedding[t]+power_curtailment[t]==Demand[t])*(1+uncertainty[s][t])
@objective(primal, Min, sum(power[i,t]*10 for i in 1:N for t in 1:T)+sum(power_shedding[t]*70 for t in 1:T)+sum(power_curtailment[t]*70 for t in 1:T))

For this small example, solving the deterministic equivalent is actually more efficient. However, in larger, more realistic instances, Benders decomposition proves to be the better approach.

I haven’t explicitly formulated the primal to generate cuts using dual(constraint), but I suspect that querying dual() wouldn’t be significantly faster than using value().

If you run @profview test(20, 10), you’ll notice that about 25% of the total runtime is spent retrieving variable values, with some runtime dispatch.

I would follow the example in the JuMP documentation and solve the primal/primal:

using JuMP, Gurobi, Random

function test2(S, Maxiter)
    T = 24
    N = 10
    cost_1 = [4.3169962, 4.5086696, 4.0460142, 4.9007914, 4.554301799999999, 6.3083598, 5.9217375, 6.4922892, 6.1188277, 5.4346844]
    Pmin = [32.0, 37.0, 48.0, 48.0, 47.0, 50.0, 62.0, 55.0, 85.0, 84.0]
    Pmax = [118.0, 119.0, 105.0, 113.0, 102.0, 190.0, 208.0, 211.0, 282.0, 323.0]
    delta_up = [42.003808, 31.325072, 20.632301, 27.869072, 22.204262, 51.694535, 67.064482, 64.139573, 89.618227, 101.140357]
    delta_down = [32.599073, 36.390287, 21.831385, 27.091832, 27.417263, 61.702851, 49.715649, 69.178653, 78.751405, 99.597226]
    Demand = [953.170832, 902.233658, 736.221518, 730.912573, 743.969707, 801.50719, 958.479777, 1170.120146, 1366.551107, 1427.675715, 1434.849965, 1453.913382, 1428.536626, 1313.892113, 1354.641852, 1366.120652, 1360.237768, 1322.644699, 1270.70313, 1287.921329, 1323.649093, 1255.637205, 1141.566632, 1034.95728]
    uncertainty = [randn(T) / 100 for s in 1:S]
    master = direct_model(Gurobi.Optimizer())
    set_silent(master)
    @variable(master, is_on[i in 1:N, t in 1:T], Bin)
    @variable(master, z[1:S] >= 0)
    @objective(
        master,
        Min,
        sum(is_on[i,t] * cost_1[i] for i in 1:N for t in 1:T) + sum(z) / S,
    )
    subproblem = direct_model(Gurobi.Optimizer())
    set_silent(subproblem)
    @variables(subproblem, begin
        is_on_copy[i in 1:N, t in 1:T] == 0
        power[i in 1:N, t in 1:T] >= 0
        power_shedding[t in 1:T] >= 0
        power_curtailment[t in 1:T] >= 0
    end)
    is_on_copy_ref = FixRef.(is_on_copy)
    @constraints(subproblem, begin
        [i in 1:N, t in 1:T], power[i, t] <= Pmax[i] * is_on_copy[i,t]
        [i in 1:N, t in 1:T], power[i, t] >= Pmin[i] * is_on_copy[i,t]
        c_demand[t in 1:T],
            sum(power[:,t]) + power_shedding[t] + power_curtailment[t] == 0 # Demand[t] * (1 + uncertainty[s][t])
    end)
    @objective(
        subproblem,
        Min,
        10 * sum(power)+ 70 * sum(power_shedding) + 70 * sum(power_curtailment)
    )
    LB, UB = 1.0, 1e9
    for k in 1:Maxiter
        if (UB - LB) / LB < 0.01 # convergence at 1%
            break
        end
        optimize!(master)
        assert_is_solved_and_feasible(master)
        LB = objective_value(master)
        is_onₖ = JuMP.value.(is_on)
        fix.(is_on_copy, is_onₖ)
        ub_1, ub_2 = LB - sum(value.(z)) / S, 0.0
        for s in 1:S
            for t in 1:T
                set_normalized_rhs(c_demand[t], Demand[t] * (1 + uncertainty[s][t]))
            end
            optimize!(subproblem)
            assert_is_solved_and_feasible(subproblem; dual = true)
            @constraint(
                master,
                z[s] >= objective_value(subproblem) + sum(
                    dual(is_on_copy_ref[i, t]) * (is_on[i,t] - is_onₖ[i,t])
                    for i in 1:N, t in 1:T
                )
            )
            ub_2 += objective_value(subproblem)
        end
        UB = min(UB, ub_1 + ub_2 / S)
    end
    return LB, UB
end

@time test2(20, 10)

If you wanted to, you could replace dual with

function unsafe_dual(c::ConstraintRef)
    return MOI.get(backend(owner_model(c)), MOI.ConstraintDual(), index(c))
end

but I’m not sure that focusing on this call is a good use of time. This is only showing up as a contributor because the subproblems are so trivial to solve. If the subproblems took longer to solve it wouldn’t be a problem.

I would focus on improving the algorithmic aspects. You could try average cut instead of multi-cut, or batching some of your scenarios. You could also try the callback single-tree algorithm, which might reduce the number of subproblems that you need to solve.

FixRef is indeed a great invention which seems to be tailored for invoking dual(::ConstraintRef). But the naming style (CapitalStartedWithoutUnderline) seems to be distinct, why?

Sometimes we do manually write the dual program because it is more apt (e.g. the subproblem in Adaptive Robust Optimization).
I was dreaming that JuMP-dev could invent a new smart software tool to help us do this work automatically, are you interested, I can share you more information.

There is FixRef, LowerBoundRef, IntegerRef, etc. These are the only way to access the constraint references associated with a scalar variable.

They are named like they because they act like a type constructor. For example, it ConstraintRef not constraint_ref. See the JuMP style guide. Admittedly, constructors that are functions are a bit of a grey area. There is an argument for both FixRef or fix_ref. At the time we went with FixRef; it’s too late to change now.

I was dreaming that JuMP-dev could invent a new smart software tool to help us do this work automatically

See GitHub - jump-dev/Dualization.jl: Automatic dualization feature for MathOptInterface.jl

1 Like

I’m not sure that the fact that the subproblems are solved quickly really matters. Indeed, if they took longer to solve, using Benders decomposition wouldn’t make sense anymore.

To try to convince you, let me give you another example with a callback (N is the number of generators, you can take it in {10, 50, 100}, for example):

using JuMP, Gurobi, Random, UnitCommitment
Random.seed!(123)
function test_callback(S, N, timelimit)

instance = UnitCommitment.read_benchmark(
"or-lib/$N"*"_0_1_w",
)
Units=instance.scenarios[1].buses[1].thermal_units

T=24
cost_1=[unit.startup_categories[1].cost/10 for unit in Units]
Pmin=[Units[i].min_power for i in 1:N]
Pmax=[Units[i].max_power for i in 1:N]
Demand=instance.scenarios[1].buses[1].load
cost_2=[unit.cost_segments[1].cost[2]/10 for unit in Units]

uncertainty=[randn(T)/100 for s in 1:S]

# master problem
master = Model(Gurobi.Optimizer)
set_silent(master)
@variable(master, is_on[i in 1:N, t in 1:T],Bin)
@variable(master, z>=0)
@objective(master, Min, z+sum(is_on[i,t]*cost_1[i] for i in 1:N for t in 1:T)) # second stage cost + first stage cost

# subproblem
subproblem = Model(Gurobi.Optimizer)
set_silent(subproblem)

@variable(subproblem, μₘᵢₙ[i in 1:N, t in 1:T]>=0)
@variable(subproblem, μₘₐₓ[i in 1:N, t in 1:T]>=0)
@variable(subproblem, μꜛ[i in 1:N, t in 1:T]>=0)
@variable(subproblem, μꜜ[i in 1:N, t in 1:T]>=0)
@variable(subproblem, ν[t in 1:T])
@constraint(subproblem,  [i in 1:N, t in 1:T], μₘᵢₙ[i, t]-μₘₐₓ[i, t]+ν[t]==cost_2[i])
@constraint(subproblem,  [t in 1:T], ν[t]<=70)
@constraint(subproblem,  [t in 1:T], ν[t]>=-70)

opt_time=0
value_time=0
k=0
function my_callback_function(cb_data, cb_where::Cint)

    if cb_where==GRB_CB_MIPSOL
        k+=1

        Gurobi.load_callback_variable_primal(cb_data, cb_where)

        is_on_val=callback_value.(cb_data, is_on)
        z_val=callback_value.(cb_data, z)

        if time() - start_optimize+0.1<=timelimit
            @objective(subproblem, Max, sum(μₘᵢₙ[i,t]*Pmin[i][t]*is_on_val[i,t] - μₘₐₓ[i,t]*Pmax[i][t]*is_on_val[i,t] for i in 1:N for t in 1:T))

            mumin=[zeros(N,T) for s in 1:S]
            mumax=[zeros(N,T) for s in 1:S]
            nu=[zeros(T) for s in 1:S]
            cost=zeros(S)
            for s in 1:S
                set_objective_coefficient(subproblem,ν, [Demand[t]*(1+uncertainty[s][t]) for t in 1:T])
                start=time()
                optimize!(subproblem)
                opt_time+=time()-start
                start_value=time()
                mumin[s]=JuMP.value.(μₘᵢₙ)
                mumax[s]=JuMP.value.(μₘₐₓ)
                nu[s]=JuMP.value.(ν)
                value_time+=time()-start_value
                cost[s]=objective_value(subproblem)
            end
            if sum(cost)/S>1.001*z_val
                cstr=@build_constraint(z>= sum(mumin[s][i,t]*Pmin[i][t]*is_on[i,t] - mumax[s][i,t]*Pmax[i][t]*is_on[i,t] for i in 1:N for t in 1:T for s in 1:S)/S+sum(nu[s][t]*Demand[t]*(1+uncertainty[s][t]) for t in 1:T for s in 1:S)/S)
                MOI.submit(master, MOI.LazyConstraint(cb_data), cstr)
            end
        end
    end
    return
end

set_optimizer_attribute(master, "LazyConstraints", 1)    
set_optimizer_attribute(master, "TimeLimit", timelimit)
set_optimizer_attribute(master, "Threads", 1)
MOI.set(master, Gurobi.CallbackFunction(), my_callback_function)
start_optimize = time()
optimize!(master)
computation_time = time() - start_optimize
println("Iterations: ", k)
println("Total running time: ", computation_time)
println("Time to solve the subproblems :", opt_time)
println("Time to get values: ", value_time)
return (k, opt_time, value_time)

end

Here are the outputs with 20 scenarios and a time limit of 20 seconds:
test(20, 10, 20)
Iterations: 202
Total running time: 20.003000020980835
Time to solve the subproblems: 3.5030081272125244
Time to get values: 4.381990194320679

test(20, 50, 20)
Iterations: 77
Total running time: 20.051999807357788
Time to solve the subproblems: 5.139000654220581
Time to get values: 8.789000988006592

test(20, 100, 20)
Iterations: 46
Total running time: 20.01900005340576
Time to solve the subproblems: 5.190997838973999
Time to get values: 10.134002208709717

As you can see, querying the optimal values takes a lot of time and significantly slows down Benders.

I won’t go into details because it’s a bit long, but I really need the values of the dual variables mumin, mumax and nu since I use them to deduce efficient cuts (better than the classic ones in my code above and your “Primal” formulation) that accelerate convergence.

Can you offer the theory (a paper or a tutorial)?
Since the subproblem is continuous, I contend that Benders’ is already efficient and takes some beating.

Actually I knew this package Dualization.jl.
But my dream is different, being something like a code generator (e.g. ChatGPT).
If a segment of JuMP code of the primal formulation is input into it, then it can print the JuMP code of the dual formulation in julia REPL, then I can copy it to my text editor. (Sounds a bit advanced, but it is bound to be way more flexible—thus usable)
For example, I input the following code

function stage2_problem_primal(z, d)
    JuMP.@variable(st2, x[i = 1:3, j = 1:3] >= 0)
    JuMP.@constraint(st2, DI[i = 1:3], z[i]         >= sum(x[i, :]))
    JuMP.@constraint(st2, DJ[j = 1:3], sum(x[:, j]) >= d[j]        )
    JuMP.@objective(st2, Min, sm(C_x, x))
end

Then an oracle gives me the following code at julia REPL

function stage2_problem_dual(z, d)
    JuMP.@variable(Dst2, DI[i = 1:3] >= 0)
    JuMP.@variable(Dst2, DJ[j = 1:3] >= 0)
    JuMP.@constraint(Dst2, x[i = 1:3, j = 1:3], C_x[i, j] + DI[i] - DJ[j] >= 0)
    JuMP.@objective(Dst2, Max, sm(DJ, d) - sm(DI, z))
end

The sm function is dot, see here.
As you can see there are strong correspondence among the two:

  1. The primal is Min, then the dual is Max.
  2. The primal’s variables become the dual’s constraint, whereas the primal’s constraint become the dual’s variable
  3. If we do not write “<=” constraint, and do not use “<= 0” variables in the primal side, then the same appearance will be on the dual side.