Conditional Constraint Construction Based on Input Data Presence/Absence of Set Combinations in Julia JUMP

Hello everyone. I am currently transitioning from GAMS optimisation to Julia optimisation (in VScode) and trying to figure out the “optimal/fastest” way of constructing conditional constraints based on the presence (or absence) of input data per possible set combinations, which will result at the fastest possible evaluation by Julia. An example:

The following sets are utilisised in the definition of key constraint/variables:

#Utilised packages
using JuMP
using HiGHS
using DataFrames
using XLSX

#Model declaration
Model_m = Model(HiGHS.Optimizer)

#Declare sets. For simplicity, I'll just give you their size and not the actual elements
Technology  = collect(1:8)
Country  = collect(1:1)
Region  = collect(1:1)
Area  = collect(1:2)
Scenario  = collect(1:5)
Year  = collect(1:4)
Season  = collect(1:52)
Time  = collect(1:168)
Fuel  = collect(1:11)
Direction = ["Input", "Output"]

Assume that I have a spreadsheet with all required parameters, named “InputData.xlsm”.

There, a parameter “CarrierMix” exists, in a form of a 16x9 table, which I push into a Dataframe to further handle in the optimisation process (15x9 dataframe, with the first row being the header, present in row 20 of the excel file.

See screenshot below just for the visual explanation:

#Declare parameters by importing data from excel file into a dataframe
CarrierMix = DataFrame(XLSX.readtable("InputData.xlsm", "CarrierMix"; first_row=20)...)

which results in the following dataframe (see screenshot for clarification):

As you will see in the constraint formation below, the first 2 columns of the dataframe are practically identifiers of 2 sets (Direction, Fuel), and the rest of the column headers correspond to identifiers of another set (Technology). Note that in the Dataframe, the first 2 columns don’t include all possible combinations of the sets Direction (2 elements) and Fuel (11 elements) (the resulting dataframe is 15 x 9, while all possible combinations would lead to 22 x 9).

Based on this dataframe, I aim to construct a constraint for each combination of the first 2 columns which includes a number in the other columns, while do nothing for the rest.

See below:

#Declare variables
@variable(Model_m, 0<=vTotalFuelUse[tech in 1:length(Technology), c in 1:length(Country), r in 1:length(Region), a in 1:length(Area), sc in 1:length(Scenario), y in 1:length(Year), s in 1:length(Season), t in 1:length(Time)]);

@variable(Model_m, 0<=vSpecificFuelUse[tech in 1:length(Technology), c in 1:length(Country), r in 1:length(Region), a in 1:length(Area), f in 1:length(Fuel), sc in 1:length(Scenario), y in 1:length(Year), s in 1:length(Season), t in 1:length(Time)]);


#Declare constraints
@constraint(Model_m, FuelMix[tech in 1:length(Technology), c in 1:length(Country), r in 1:length(Region), a in 1:length(Area), f in 1:length(Fuel), sc in 1:length(Scenario), y in 1:length(Year), s in 1:length(Season), t in 1:length(Time)], 
                                vSpecificFuelUse[tech, c, r, a, f, sc, y, s, t]
                                ==
                                if !isempty(filter(row -> row.Direction == "Input" && row.Fuel == f, CarrierMix))
                                    if !ismissing(filter(row -> row.Direction == "Input" && row.Fuel == f, CarrierMix)[!,Technology[tech]][1]) && !isempty(filter(row -> row.Direction == "Input" && row.Fuel == f, CarrierMix)[!,Technology[tech]][1])
                                        filter(row -> row.Direction == "Input" && row.Fuel == f, CarrierMix)[!,Technology[tech]][1]
                                        *
                                        vTotalFuelUse[tech, c, r, a, sc, y, s, t]
                                    end
                                else
                                    0    
                                end
            );

Ultimately I what I do is that I handle the dataframe as a 2D mapping, and I want for each combination of sets which has a corresponding number in the dataframe, to grab that number and multiply it with the variable “vTotalFuelUse”, while ignoring the constraint for whatever combination doesn’t exist or doesn’t have data (which would result in an error).

It seems that it takes an extremely lengthy time for Julia to evaluate this constraint when I hit Shift + Enter on it, so I assume I am making this declaration too complicated, or there is a fastest workaround (similar to the dollar operator ($) in GAMS).

For the reference, I also attempted to simply use the following format of error avoidance, but I got no faster luck:

@constraint(Model_m, FuelMix[tech in 1:length(Technology), c in 1:length(Country), r in 1:length(Region), a in 1:length(Area), f in 1:length(Fuel), sc in 1:length(Scenario), y in 1:length(Year), s in 1:length(Season), t in 1:length(Time)], 
                                vSpecificFuelUse[tech, c, r, a, f, sc, y, s, t]
                                ==
                                try
                                    filter(row -> row.Direction == "Input" && row.Fuel == f, CarrierMix)[!,Technology[tech]][1]
                                catch
                                    0
                                end
                                *
                                vTotalFuelUse[tech, c, r, a, sc, y, s, t]
            );

Any suggestions would be greatly appreciated.

Thanks in advance.

1 Like

This question has come up a few times recently (Suggestions for documentation improvements · Issue #2348 · jump-dev/JuMP.jl · GitHub), so I definitely need to write some guidance on transitioning from GAMS to JuMP.

Can you provide a reproducible example of what you currently have, as well as the input data file? I’ll have a go to see how I would write your model.

Transitioning from GAMS to JuMP is a little complicated, because of the way we differ in handling data.

For example, GAMS lets you create “dense” sets and variables, and then index them using for-if loops. This means a direct translation of your GAMS code would try to look like:

model = Model()
@variable(model, x[A, B])
@constraint(model, [a in A], sum(x[a, b] for b in B if something(a, b)) == 1)

If there are very few values of a and b for which something(a, b) is true (that is, x is sparse), then this approach is inefficient because it requires a loop over every element in B for every element in A, just to throw away most of the work because something(a, b) == false.

The for-if approach works in GAMS because behind the scenes, GAMS translates this into a much more efficient format using relational algebra. But the for-if approach doesn’t work in JuMP because we don’t do the translation.

One approach is to only create variables for which the result is non-zero:

a_to_b = Dict(a => [] for a in A)
for a in A, b in B
    if something(a, b)
        push!(a_to_b[a], b)
    end
end
model = Model()
@variable(model, x[a in A, b in a_to_b[a]])
@constraint(model, [a in A], sum(x[a, b] for b in a_to_b[a]) == 1)

The other approach is to change your data structure to something like a DataFrame:

import DataFrames
df = DataFrames.DataFrame([(a = a, b = b) for a in A, b in B if something(a, b)])
model = Model()
df.x = @variable(model, x[1:size(df, 1)])
for gdf in DataFrames.groupby(df, [:a])
    @constraint(model, sum(gdf.x) == 1)
end

Here are some other links:

1 Like

Thank you for the quick response and material!

Of course I can provide a working copy. You can find it below:
Link to Code and Input Data on Dropbox

or

Link to Code and Input Data on Github

I had a skim through available topics before posting based on key words that came to my mind, but it didn’t seem like I could find any problem that I could relate on the first looks. I’ll also now go deeper in the supporting links you suggested.

I managed in the meantime to create a similar but quicker constraint evaluation, which for the current dataset takes approximately 1 minute and 15 seconds to get evaluated on VScode on my end. Practically I created 2 for loops looking for combinations with existing data in the input file. If data is present, then the constraint formation takes place, otherwise the loop moves on. However, I’d expect that for a large model, this sort of duration per constraint can be quite a limiting factor. Especially if the counterfactual of a more “suitable/elegant” Julia style approach results to significant time savings.

In general, as I see the problem while transitioning from GAMS to Julia while carrying the same “logic”, is the way of loading and accessing data to the model. In GAMS I usually feed the data in a multi dimensional way through a GDX file, and then I can kind of easily access specific set combinations. In Julia, I found a “solution” via feeding my data into a dataframe and then simply working my way to “fishing out” the unique existing value matching the dataframe - filtering criteria that I apply, based on set indices which exist in the given dataframe structure (some identifier columns and some dataframe headers).

I appreciate/suspect that your second example with the groupby command could achieve the same result to my “lookup method” quite faster. That’s a good shout! The only problem which I could suspect emerging would be that due to feeding data to the model via excel and being subject to its sizing constraints (number of rows/columns), I will have to have set identifiers both on the x and y axis of the excel source and consequently in the generated dataframe, in case of parameters defined over multiple sets with multiple indices each. Then I will have to groupby in some 2D way I guess.

From the proposed suggestions, would you know by experience wether 1 of those is “superior/quicker” to the other? Or is it subject to the application basis?

1 Like

This is the way. A table in a GDX file is (nearly) the same as a Julia DataFrame.

Let me take a look at your code and come up with some ideas.

I’d write your model quite differently.

Let’s ignore the country/regionn/area/sceario/year/season/time sets for now, because they aren’t relevant to the input/output balance.

You could do something like:

df_carrier = DataFrames.stack(
    CarrierMix, 
    DataFrames.Not([:Direction, :Fuel]);
    variable_name = :Technology,
)
filter!(row -> row.value > 0, df_carrier)
sort!(df_carrier, [:Direction, :Technology])
model = Model()
@variable(model, x[Technology] >= 0)  # Quantity of each technology
@variable(model, inputs[Fuel])  # Quantity of each input fuel
@variable(model, outputs[Fuel])  # Quantity of each output fuel
for (index, df) in pairs(DataFrames.groupby(df_carrier, [:Direction, :Fuel]))
    if index.Direction == "Input"
        @constraint(
            model,
            inputs[index.Fuel] == sum(row.value * x[row.Technology] for row in eachrow(df))
        )
    else
        @constraint(
            model,
            outputs[index.Fuel] == sum(row.value * x[row.Technology] for row in eachrow(df))
        )
    end
end

julia> print(model)
Feasibility
Subject to
 -51 x[Electrolysis] - x[Electric_Storage] - 3 x[Hydrogen_Storage] - 0.17 x[Methanol_Synthesis] - 0.32 x[Ammonia_Synthesis] + inputs[Electricity] = 0
 -33.36111111111111 x[Hydrogen_Storage] - 6.672222222222221 x[Methanol_Synthesis] - 6.08 x[Ammonia_Synthesis] + inputs[Hydrogen] = 0
 -0.84 x[Ammonia_Synthesis] + inputs[Nitrogen] = 0
 -9 x[Electrolysis] + inputs[Water] = 0
 -1.46 x[Methanol_Synthesis] + inputs[CO2] = 0
 -x[Solar_PV] + inputs[Sole] = 0
 -x[Wind_Turbine] + inputs[Wind] = 0
 -5.249999999999999 x[Ammonia_Synthesis] + outputs[Ammonia] = 0
 -7.46 x[Electrolysis] - 2.07 x[Methanol_Synthesis] - 0.26 x[Ammonia_Synthesis] + outputs[Heat] = 0
 -0.95 x[Electric_Storage] - 0.95 x[Solar_PV] - 0.95 x[Wind_Turbine] + outputs[Electricity] = 0
 -33.36111111111111 x[Electrolysis] + outputs[Hydrogen] = 0
 -30.358611111111106 x[Hydrogen_Storage] + outputs[HydrogenCom] = 0
 -5.5361111111111105 x[Methanol_Synthesis] + outputs[Methanol] = 0
 x[Electrolysis] ≥ 0
 x[Electric_Storage] ≥ 0
 x[Hydrogen_Storage] ≥ 0
 x[Methanol_Synthesis] ≥ 0
 x[Solar_PV] ≥ 0
 x[Wind_Turbine] ≥ 0
 x[Ammonia_Synthesis] ≥ 0

My key suggestion would be to stop thinking of your data as 2-d tables with some values of 0 (or missing), and normalize it into some appropriate tabular form. For example, my df_carrier looks like:

julia> df_carrier
23×4 DataFrame
 Row │ Direction  Fuel         Technology          value   
     │ Any        Any          String              Any     
─────┼─────────────────────────────────────────────────────
   1 │ Input      Electricity  Ammonia_Synthesis   0.32
   2 │ Input      Hydrogen     Ammonia_Synthesis   6.08
   3 │ Input      Nitrogen     Ammonia_Synthesis   0.84
   4 │ Input      Electricity  Electric_Storage    1.0
   5 │ Input      Electricity  Electrolysis        51.0
   6 │ Input      Water        Electrolysis        9.0
   7 │ Input      Electricity  Hydrogen_Storage    3.0
   8 │ Input      Hydrogen     Hydrogen_Storage    33.3611
   9 │ Input      Electricity  Methanol_Synthesis  0.17
  10 │ Input      Hydrogen     Methanol_Synthesis  6.67222
  11 │ Input      CO2          Methanol_Synthesis  1.46
  12 │ Input      Sole         Solar_PV            1.0
  13 │ Input      Wind         Wind_Turbine        1.0
  14 │ Output     Ammonia      Ammonia_Synthesis   5.25
  15 │ Output     Heat         Ammonia_Synthesis   0.26
  16 │ Output     Electricity  Electric_Storage    0.95
  17 │ Output     Hydrogen     Electrolysis        33.3611
  18 │ Output     Heat         Electrolysis        7.46
  19 │ Output     HydrogenCom  Hydrogen_Storage    30.3586
  20 │ Output     Methanol     Methanol_Synthesis  5.53611
  21 │ Output     Heat         Methanol_Synthesis  2.07
  22 │ Output     Electricity  Solar_PV            0.95
  23 │ Output     Electricity  Wind_Turbine        0.95

this is a lot easier to work with than the matrix.

I think you hit the nail on the head with the solution above. You both got around the excel column/rows limitations that I faced (vs the familiar to me GDX approach) by reshaping the data into to a tabular form after feeding them into Julia, but you also simplified the dataframe to only the necessary entities, practically making all the “checks” that I was running redundant.

I’ll probably need to familiarise myself better with the way of handling data into dataframes, but you provided some nice food for thought. Practically the country/region/area/scenario/year/season/time sets will also need to be included as each technology may not operate to its max capacity at any given moment, and this constraint (sort of) will provide the relationship between individual fuel use and total fuel use in each moment. So I’ll need to think how to reconstruct it based on this skeleton, hoping to no issues. I’d expect that it shouldn’t be too much trouble to add them, even though some manipulations to the [index.] and [.row] functionalities will be needed.

Your help is much appreciated Oscar! Really valuable. Thank you.

1 Like

Practically the country/region/area/scenario/year/season/time sets will also need to be included as each technology may not operate to its max capacity at any given moment

The main thing to consider is whether these are dense or sparse, and whether the carrier values are constant or change based one the region etc.

Here’s one option to extend to countries:

model = Model()
@variable(model, x[Technology, Country] >= 0)  # Quantity of each technology
@variable(model, inputs[Fuel, Country])  # Quantity of each input fuel
@variable(model, outputs[Fuel, Country])  # Quantity of each output fuel
for (index, df) in pairs(DataFrames.groupby(df_carrier, [:Direction, :Fuel]))
    if index.Direction == "Input"
        @constraint(
            model,
            [c in country],
            inputs[index.Fuel, c] == sum(row.value * x[row.Technology, c] for row in eachrow(df))
        )
    else
        @constraint(
            model,
            [c in country],
            outputs[index.Fuel, c] == sum(row.value * x[row.Technology, c] for row in eachrow(df))
        )
    end
end

but feel free to start a new thread if you get stuck after working on this for a while.

1 Like