I guess one option is to start with the nested for loops. If it’s easy to write and understand, that’s all you need. It’ll be faster than PuLP. But you might still run into scaling issues. (Although perhaps your fine with the runtime.)
If it becomes a problem, here’s another option:
using JuMP
import DataFrames
df = DataFrames.DataFrame(
origin_node = ["A", "A", "B", "C"],
destination_node = ["C", "C", "C", "A"],
coal_group = ["x", "y", "y", "y"],
)
model = Model()
df.x = @variable(model, x[1:size(df, 1)] >= 0, base_name = "mass_flow")
locations = union(df.origin_node, df.destination_node)
for (index, gdf) in pairs(DataFrames.groupby(df, :coal_group))
@constraint(
model,
[l in locations],
sum(r.x for r in eachrow(gdf) if r.destination_node == l) ==
sum(r.x for r in eachrow(gdf) if r.origin_node == l),
)
end