Autodiff for max-flow

Hi, I would like to use Enzyme.jl to get gradients for a max-flow result. Enzyme gives me an illegal type analysis error in this MWE:

using Enzyme
using Graphs, GraphsFlows

# Graph from GraphsFlows.jl demo
flow_graph = Graphs.DiGraph(8) # Create a flow graph
flow_edges = [
    (1,2,10),(1,3,5),(1,4,15),(2,3,4),(2,5,9),
    (2,6,15),(3,4,4),(3,6,8),(4,7,16),(5,6,15),
    (5,8,10),(6,7,15),(6,8,10),(7,3,6),(7,8,10)
]

capacity_matrix = zeros(8, 8)  # Create a capacity matrix

for e in flow_edges
    u, v, f = e
    Graphs.add_edge!(flow_graph, u, v)
    capacity_matrix[u,v] = f
end

# Some arbitrary function which depends on the max-flow results
foo(capacity_matrix) = maximum_flow(flow_graph, 1, 8, capacity_matrix, algorithm=DinicAlgorithm())[2][1, 2] # Run Dinic's algorithm
foo(capacity_matrix) # 10.0

gradient(Forward, foo, capacity_matrix)

Is this a lost cause because of the numerous type instabilities (JET.jl gives me 136 possible errors…)?

Hi @PaterPen!
Can you explain why you need derivatives for a max flow? Is your output the total cost of the flow, or its breakdown along each edge? Is your input only the capacity matrix, or also the edge costs?
Typically, in such cases, one wants to avoid differentiating through the optimization algorithm, and instead use problem knowledge to speed up the differentiation process. Examples of packages for this purpose are DiffOpt.jl and InferOpt.jl.
Nowadays, Enzyme.jl can handle type instabilities, but perhaps not that many. GraphsFlows.jl is an old package, so the code may not be ideal for autodiff. However it’s exactly my area of study, and I maintain both the JuliaGraphs and JuliaDiff ecosystems, so I’m the ideal person to help you!

1 Like

Is there an error message you can share / post as an issue?

Separately, it potentially may be useful to look at API reference · Enzyme.jl (though ideally this shouldn’t happen so please post an issue).

1 Like

Thanks for your interest! This is a toy model in which the max-flow is solved for several snapshots.

In each of those snapshots the capacities from the source/target node to the other nodes are modified according to a time series. However the capacities from source to the other nodes have free parameters which scale the time series. I.e. let c_i(t) the capacity from the start node to node i at time t then c_i(t) = x_i(t) \cdot \alpha_{\text{start},i}, where x_i(t) is a time series. These edges from source represent energy generation which can be scaled whereas the edges to target represent energy consumption (again given by some time series y_i(t)).

The edge capacities between other nodes can be modified as well but are fixed for all snapshots. Denote the capacity between node i,j with \alpha_{i,j}. Additionally, let F(t) the flow matrix of our solved max-flow at time t, then (with some abuse of notation) we get total \operatorname{costs}_{t_1,...,t_N} = f((\alpha_{i,j})_{i,j \in I}, F_{t_1},...,F_{t_n}).

Intuitively, we optimize over both “building costs” which result directly from the free parameters \alpha_{i,j} and some loss function resulting from the net flow (net energy) at an edge.

I hope this, admittedly, sloppy explanation is nevertheless a bit insightful.

I will do so!

My spontaneous recommendation would be to model everything with a JuMP.jl program (either linear or convex) and then use DiffOpt.jl to differentiate it with respect to the parameters. But maybe @mbesancon would suggest something else.
You can draw inspiration from GraphsOptim.jl on how to model the flow in JuMP.

1 Like

Thanks for the input! My initial idea was that a specialised max-flow solver might be significantly faster than a LP solver. But indeed the general handling would be much more convenient with JuMP.jl.

Right now I can solve it with Nelder-Mead (and the results look okay), but I am not fully convinced that it does not get stuck in local minima. Therefore I thought about using higher-order methods and JuMP.jl with DiffOpt.jl might be the way to go. Thanks for this suggestion, I didn’t know of the latter!

1 Like

You’re right that a dedicated solver is faster but:

  1. It’s less flexible, and doesn’t accommodate other terms you might need in the objective
  2. Differentiating through an optimization solver is usually inefficient, even with Enzyme.

When you differentiate through the solver, you “piggyback” autodiff across every single iteration, which makes memory costs scale linearly in reverse mode.
The clever thing to do is to exploit the optimality conditions and differentiate through these conditions with the implicit function theorem (as explained in my recent autodiff tutorial). Formulating these optimality conditions is a pain in the butt, and DiffOpt.jl takes care of it for you.

Note that it will not work if you differentiate with respect to the cost vector of an LP. In that case, the mapping to the solution becomes piecewise constant, and so the derivatives you get will be useless. From what I understand, only the constraints vary here so you’re fine, but if your cost varies too, hit me up again and we’ll talk about InferOpt.jl.

1 Like

Follow-up/rookie question: I am wondering though, how to implement this with JuMP.jl efficiently since the naive approach would be to update the max-flow capacities and solve the model for every snapshot.

But how do I get the derivatives then? I could of course determine them for every snapshot and use some custom implementation to further use them (which is tedious). Also, this linear programming approach with HiGHS.jl takes around 30x longer than the previous approach (Dinic’s algorithm).

Is there maybe an obvious way to aggregate the different snapshot models/solutions into one big JuMP.jl model?

I was thinking you put everything inside one big model (all the flows in all the snapshots)?

I solve the snapshots in parallel (they are independent of each other given the parameters) and then have a cost function which depends on all solutions. That means, I can replace the solution algorithm used in every snapshot easily with a JuMP model. But that is not really what I want, is it?

My bad, I had not realized the snapshots were independent. Then yes you can solve them in parallel, each with its own JuMP model, and then sum the solution. When you compute a gradient, DiffOpt.jl will backpropagate through every single model. It may not be very fast, but it will give you the right derivatives.

Differentiating through Dinic’s algorithm is another challenge entirely. If you can manage to write a simple optimality condition, we may be able to make it work with ImplicitDifferentiation.jl, but I don’t know how feasible that is

1 Like

Say our capacity-to-flow function is y(x). For implicit differentiation, I need the optimality of a flow to be expressed by a set of conditions c(y, x) = 0. In the case of Dinic’s algorithm, it comes down to expressing “there is no blocking flow” as “some vector that is a function of the capacities and the current flow equals 0”. If you can write c as a Julia function that is differentiable, I’ll help you with the rest

1 Like

As @gdalle mentioned, one option is the DiffOpt route which will work.

I am not sure whether your function depends on the value of the maximum flow (the scalar amount of flow from source to sink) or on the value of the flow in the graph (the vector of flows associated with each edge).

If the former, you can do something much simpler:

You have a problem with a very particular structure here and can use LP duality to derive the derivative you want.

Namely, calling v(c) the optimal value with the given capacity, LP sensitivity gives you that: \mathrm{d} v = y \cdot \mathrm{d} c, where y is the optimal dual solution.

In the case of maxflow, this dual solution can be obtained without an LP solver, it’s a minimum s-t cut corresponding to the flow you obtained. You can compute this cut easily from the flow solution, the cut is a subset of the edges that are saturated and separate s from t.

What’s the gradient of the maxflow now? 0 for any edge that is not saturated, one for the others, since reducing the capacity of a saturated edge directly reduces the maximum amount of flow. Note that the gradient is well-defined (the limit is the same when perturbing the capacity up or down) only if the minimum cut is unique, you can construct counterexamples otherwise with two paths from s to t, one has two saturated edges, the other one a single one.

Hope this helps!

2 Likes

Thanks for this idea! However my function indeed depends on the latter.