How to solve infinite horizon problem under Makovian policy graph in SDDP.jl?

I am dealing with a multi-stage mixed-integer stochastic global supply chain optimization problem and want to solve infinite-horizon problem instances under a Markovian policy graph. I am using the Markovian policy graph because the constraint matrix contains uncertain parameters (in a linear programming model).

Can the infinite horizon problem instance be solved under a Markovian Policy graph using SDDP.jl?

Can the infinite horizon problem instance be solved under a Markovian Policy graph using SDDP.jl?

Yes.

See Create a general policy graph · SDDP.jl for instructions on how to create an arbitrary graph. I would start with a SDDP.MarkovianGraph and then add the cyclic edges.

Here’s the code for the model in https://doi.org/10.1002/net.21932:

Got it, I may ask, can I specify several root nodes, or do I have to specify a dummy root node for it to accurately implement the formulation?

Additionally, can I dynamically create and specify decision node and edge names in a for loop, for example, without having to enumerate each of them one by one?

Found the solution of my second question in the above comment: use Symbol("decision_node_name") to specify the symbols of nodes and edges dynamically. Thnaks.

1 Like

No. There must be only a single root node. (The circle node in a policy graph diagram.)

Additionally, can I dynamically create and specify decision node and edge names in a for loop, for example, without having to enumerate each of them one by one?

Yes, see Create a general policy graph · SDDP.jl. The graph is a regular Julia data structure. You can use for-loops etc to build it as you would any other data structure.

You might, for example, but a graph like this:

import SDDP
T = 12
M = [0.2 0.8; 0.8 0.2]
graph = SDDP.Graph((0, 0))
for t in 1:T, i in 1:2
    SDDP.add_node(graph, (t, i))
end
for i in 1:2
    SDDP.add_edge(graph, (0, 0) => (1, i), 0.5)
end
for t in 2:T, i in 1:2, j in 1:2
    SDDP.add_edge(graph, (t-1, i) => (t, j), M[i, j])
end
for i in 1:2, j in 1:2
    SDDP.add_edge(graph, (T, i) => (1, j), M[i, j])
end

Got it, thanks.

1 Like

In the presence of an infinite planning horizon with termination upon cycles, how can I access the stage number within a particular episode?

Could you clarify the question? You mean when simulating? See the :node_index field.

See also Create a general policy graph · SDDP.jl

I meant during training simulations, I need to access the stage number within a particular forward pass while using the general policy graph with infinite horizon.

This is not possible. The model assumes that you do not need to know how many steps you are away from root node. If your model really depends on the number of steps from the root node, then it is not cyclic.

Right, got it.

Then does it mean that the model that is dependent on this information cannot be solved for an infinite horizon by SDDP.jl or the SDDiP algorithm in general?

P.S.
There are stochastic variables in the constraint matrix, leading to the usage of the Markovian and general policy graphs.

What is the problem you are trying to solve? Can you provide a reproducible example? Have you formulated the problem as a policy graph? What are the state variables? What are the control variables? What are the node-wise independent random variables?

You can have node-wise independent noise in the constraint matrix: Add noise in the constraint matrix · SDDP.jl. Just make it so that the noise depends on the node index, and not the number of steps from the root node.

In the case of my problem, the inventory qualities are deteriorating with the passage of time by decay factors (different with respect to time periods), but the physical amount of inventory is not deteriorating. The physical amount is actually useful for calculating several costs in the objective. This means that the values of inventories from previous time periods are discounted more than those of inventories from recent time periods.

It is from here that the time dependency comes in.

I will try my best to provide a reproducible example, but it will be difficult.

See Access variables from a previous stage · SDDP.jl

If you want to model quality decay, you must bucket your inventory into a small finite set of quality classes, model them in the state variable, and use constraints to model the transition of inventory between the quality buckets.

I used to have a tutorial on this: Update the perishable widgets example · Issue #591 · odow/SDDP.jl · GitHub but I removed the perishable stuff when I wrote Example: the milk producer · SDDP.jl

But in the case of the infinite horizon case, there will be a possibility of infinite quality buckets (we may approximate by accumulating quality buckets for supplies way back in the planning horizon to solve this problem).

And if we model it that way, there will be a huge increase in binary integer variables, significantly increasing the intractability.

Is there any other way to solve the infinite horizon case using SDDiP algorithm?

If you want to use SDDP.jl, you must formulate the model as a policy graph. That means you’ll need to make some approximations. You probably don’t have an infinite quality buckets. After some time the product gets chucked out, or the quality doesn’t deteriorate further.

And yes, there will be more variables. But it’s not increasing the intractability. Your problem is intractable. You cannot ever hope to solve it to global optimality. You need to make some simplifying assumptions.

Is there any other way to solve the infinite horizon case using SDDiP algorithm?

Nope. You must model information from previous stages in the stage variable.

Your alternative is to come up with some custom heuristic policy (not using SDDP.jl) and evaluate that. Warren Powell has a large literature on this. Do some cost-function approximation type algorithm. (Of course, the downside to CFA is that Powell can’t tell you what the right cost function is. You need to use domain knowledge. The upside to SDDP.jl is that if you can model it as a policy graph, then we can find a primal feasible policy.)

Got it.

1 Like