Autodiff for max-flow

PaterPen · May 15, 2024, 2:01pm

Hi, I would like to use Enzyme.jl to get gradients for a max-flow result. Enzyme gives me an illegal type analysis error in this MWE:

using Enzyme
using Graphs, GraphsFlows

# Graph from GraphsFlows.jl demo
flow_graph = Graphs.DiGraph(8) # Create a flow graph
flow_edges = [
    (1,2,10),(1,3,5),(1,4,15),(2,3,4),(2,5,9),
    (2,6,15),(3,4,4),(3,6,8),(4,7,16),(5,6,15),
    (5,8,10),(6,7,15),(6,8,10),(7,3,6),(7,8,10)
]

capacity_matrix = zeros(8, 8)  # Create a capacity matrix

for e in flow_edges
    u, v, f = e
    Graphs.add_edge!(flow_graph, u, v)
    capacity_matrix[u,v] = f
end

# Some arbitrary function which depends on the max-flow results
foo(capacity_matrix) = maximum_flow(flow_graph, 1, 8, capacity_matrix, algorithm=DinicAlgorithm())[2][1, 2] # Run Dinic's algorithm
foo(capacity_matrix) # 10.0

gradient(Forward, foo, capacity_matrix)

Is this a lost cause because of the numerous type instabilities (JET.jl gives me 136 possible errors…)?

gdalle · May 15, 2024, 2:18pm

Hi @PaterPen!
Can you explain why you need derivatives for a max flow? Is your output the total cost of the flow, or its breakdown along each edge? Is your input only the capacity matrix, or also the edge costs?
Typically, in such cases, one wants to avoid differentiating through the optimization algorithm, and instead use problem knowledge to speed up the differentiation process. Examples of packages for this purpose are DiffOpt.jl and InferOpt.jl.
Nowadays, Enzyme.jl can handle type instabilities, but perhaps not that many. GraphsFlows.jl is an old package, so the code may not be ideal for autodiff. However it’s exactly my area of study, and I maintain both the JuliaGraphs and JuliaDiff ecosystems, so I’m the ideal person to help you!

wsmoses · May 15, 2024, 4:37pm

Is there an error message you can share / post as an issue?

Separately, it potentially may be useful to look at API reference · Enzyme.jl (though ideally this shouldn’t happen so please post an issue).

PaterPen · May 15, 2024, 6:35pm

Thanks for your interest! This is a toy model in which the max-flow is solved for several snapshots.

In each of those snapshots the capacities from the source/target node to the other nodes are modified according to a time series. However the capacities from source to the other nodes have free parameters which scale the time series. I.e. let c_i(t) the capacity from the start node to node i at time t then c_i(t) = x_i(t) \cdot \alpha_{\text{start},i}, where x_i(t) is a time series. These edges from source represent energy generation which can be scaled whereas the edges to target represent energy consumption (again given by some time series y_i(t)).

The edge capacities between other nodes can be modified as well but are fixed for all snapshots. Denote the capacity between node i,j with \alpha_{i,j}. Additionally, let F(t) the flow matrix of our solved max-flow at time t, then (with some abuse of notation) we get total \operatorname{costs}_{t_1,...,t_N} = f((\alpha_{i,j})_{i,j \in I}, F_{t_1},...,F_{t_n}).

Intuitively, we optimize over both “building costs” which result directly from the free parameters \alpha_{i,j} and some loss function resulting from the net flow (net energy) at an edge.

I hope this, admittedly, sloppy explanation is nevertheless a bit insightful.

PaterPen · May 15, 2024, 6:41pm

I will do so!

gdalle · May 15, 2024, 6:59pm

My spontaneous recommendation would be to model everything with a JuMP.jl program (either linear or convex) and then use DiffOpt.jl to differentiate it with respect to the parameters. But maybe @mbesancon would suggest something else.
You can draw inspiration from GraphsOptim.jl on how to model the flow in JuMP.

PaterPen · May 15, 2024, 7:13pm

Thanks for the input! My initial idea was that a specialised max-flow solver might be significantly faster than a LP solver. But indeed the general handling would be much more convenient with JuMP.jl.

Right now I can solve it with Nelder-Mead (and the results look okay), but I am not fully convinced that it does not get stuck in local minima. Therefore I thought about using higher-order methods and JuMP.jl with DiffOpt.jl might be the way to go. Thanks for this suggestion, I didn’t know of the latter!

gdalle · May 15, 2024, 7:18pm

You’re right that a dedicated solver is faster but:

It’s less flexible, and doesn’t accommodate other terms you might need in the objective
Differentiating through an optimization solver is usually inefficient, even with Enzyme.

When you differentiate through the solver, you “piggyback” autodiff across every single iteration, which makes memory costs scale linearly in reverse mode.
The clever thing to do is to exploit the optimality conditions and differentiate through these conditions with the implicit function theorem (as explained in my recent autodiff tutorial). Formulating these optimality conditions is a pain in the butt, and DiffOpt.jl takes care of it for you.

Note that it will not work if you differentiate with respect to the cost vector of an LP. In that case, the mapping to the solution becomes piecewise constant, and so the derivatives you get will be useless. From what I understand, only the constraints vary here so you’re fine, but if your cost varies too, hit me up again and we’ll talk about InferOpt.jl.

PaterPen · May 17, 2024, 10:36pm

Follow-up/rookie question: I am wondering though, how to implement this with JuMP.jl efficiently since the naive approach would be to update the max-flow capacities and solve the model for every snapshot.

But how do I get the derivatives then? I could of course determine them for every snapshot and use some custom implementation to further use them (which is tedious). Also, this linear programming approach with HiGHS.jl takes around 30x longer than the previous approach (Dinic’s algorithm).

Is there maybe an obvious way to aggregate the different snapshot models/solutions into one big JuMP.jl model?

gdalle · May 18, 2024, 7:03am

I was thinking you put everything inside one big model (all the flows in all the snapshots)?

PaterPen · May 18, 2024, 9:18am

I solve the snapshots in parallel (they are independent of each other given the parameters) and then have a cost function which depends on all solutions. That means, I can replace the solution algorithm used in every snapshot easily with a JuMP model. But that is not really what I want, is it?

gdalle · May 18, 2024, 9:20am

My bad, I had not realized the snapshots were independent. Then yes you can solve them in parallel, each with its own JuMP model, and then sum the solution. When you compute a gradient, DiffOpt.jl will backpropagate through every single model. It may not be very fast, but it will give you the right derivatives.

Differentiating through Dinic’s algorithm is another challenge entirely. If you can manage to write a simple optimality condition, we may be able to make it work with ImplicitDifferentiation.jl, but I don’t know how feasible that is

gdalle · May 18, 2024, 9:25am

Say our capacity-to-flow function is y(x). For implicit differentiation, I need the optimality of a flow to be expressed by a set of conditions c(y, x) = 0. In the case of Dinic’s algorithm, it comes down to expressing “there is no blocking flow” as “some vector that is a function of the capacities and the current flow equals 0”. If you can write c as a Julia function that is differentiable, I’ll help you with the rest

mbesancon · May 18, 2024, 10:28am

As @gdalle mentioned, one option is the DiffOpt route which will work.

I am not sure whether your function depends on the value of the maximum flow (the scalar amount of flow from source to sink) or on the value of the flow in the graph (the vector of flows associated with each edge).

If the former, you can do something much simpler:

You have a problem with a very particular structure here and can use LP duality to derive the derivative you want.

Namely, calling v(c) the optimal value with the given capacity, LP sensitivity gives you that: \mathrm{d} v = y \cdot \mathrm{d} c, where y is the optimal dual solution.

In the case of maxflow, this dual solution can be obtained without an LP solver, it’s a minimum s-t cut corresponding to the flow you obtained. You can compute this cut easily from the flow solution, the cut is a subset of the edges that are saturated and separate s from t.

What’s the gradient of the maxflow now? 0 for any edge that is not saturated, one for the others, since reducing the capacity of a saturated edge directly reduces the maximum amount of flow. Note that the gradient is well-defined (the limit is the same when perturbing the capacity up or down) only if the minimum cut is unique, you can construct counterexamples otherwise with two paths from s to t, one has two saturated edges, the other one a single one.

Hope this helps!

PaterPen · May 18, 2024, 10:56am

Thanks for this idea! However my function indeed depends on the latter.

Topic		Replies	Views
Autodiff of vector inputs with Enzyme.jl (and possibly Optimization.jl) Optimization (Mathematical) question , optimization , sciml , enzyme	9	1081	August 16, 2023
Solving large convex optimization/flow assignment problem Optimization (Mathematical) question , first-steps , graphs , convex-optimization	7	397	January 17, 2024
Reliability of Enzyme.jl General Usage question , diffeq , autodiff	11	2069	October 23, 2022
Efficient multi-objective LP implementation Optimization (Mathematical)	5	272	May 31, 2024
Using Enzyme.jl with Flux: Issues Computing Gradients of a Model with Duplicated Parameters and Mixed Forward/Reverse AD Machine Learning flux , machine-learning , enzyme	11	196	July 29, 2025

Autodiff for max-flow

Related topics