Markovian SDDP: is a finite and concrete scenario tree necessary?

(I kinda forget these knowledge)

I asked chatGPT this question

  • is it a prerequisite to restrict the number of uncertainty realizations in each stage to be finite, if I adopt a Markov-SDDP algorithm to tackle a multistage stochastic program?

, who answered:

Short answer: No — finiteness of the true uncertainty is not a mathematical prerequisite for Markovian SDDP. What you do need is a finite sampled support per iteration and a finite Markov state space for the dynamic program you are approximating.

I can’t believe its answer.

I think:

  • to carry out a Markov-SDDP algorithm on a computer, we need to construct a concrete finite scenario tree (in the form of transition probabilities across a series of discrete uncertainty states) beforehand. Otherwise how can we store infinitely many cutting plane surrogates?

Am I right?

The short answer seems pretty reasonable to me. In SDDP.jl we require a finite sample space chosen a priori, but this isn’t strictly necessary. If the uncertainty appears only in the right-hand side of linear constraints you could sample from the true continuous distribution on the backward pass and still obtain a valid lower bound on the policy. There are various open questions in the literature about whether the algorithm would converge etc, but that’s a different topic.

Let’s please keep this forum focused on Julia-, JuMP-, and in this case, SDDP.jl-related questions. I don’t think that theoretical details of the SDDP algorithm are on-topic.

First it’s not only theoretic, but also algorithmic, directly affecting how we organize julia/JuMP/SDDP.jl code. Second it’s not detail but basic concerns when we are dealing with real-world problems.


You probably hadn’t got my idea, I guess.

For canonical SDDPs (i.e. assuming stagewise independence), we have only one cost-to-go function at each stage. Whereas for Markovian SDDPs, there are cardinality-of-support-ly many cost-to-go functions, each one should be associated with a cutting plane surrogate.

If we don’t pre-build a deterministic scenario tree which is used throughout the entire training, we would require (possibly) infinitely many cutting plane models? That’s where I find the bot’s answer questionable.

You need a finite number of nodes in the graph, but within a node you could have a continuous sample space for the nodewise-independent random variable. There are two sources of uncertainty (the Markov chain driving transitions between the nodes, and the random variable within a node), and they are treated separately in the algorithm.

In theory it’s also possible to dynamically add new nodes to the graph during the iterations, and you could construct an out-of-sample policy for an “off-chain” node on a forward pass by interpolating between some of the finite value functions. This is effectively what we do with the objective state and partially observable algorithms.

I don’t find it useful to talk about scenario trees in the context of SDDP.

But I saw you wrote

Assumption 2: finite random variables
The sample space is finite and discrete for each node


I think your “policy graph” modeling approach makes some sense. It’s clever, but appears niche—I only see this approach from your SDDP.jl. The mainstream jargon is still “scenario tree” (e.g. Shapiro’s book doi.org/10.1137/1.9780898718751).


I’ve just spent some time reading some typical references including

  • (SDDIP) Zou, J., Ahmed, S. & Sun, X.A. Stochastic dual dynamic integer programming. Math. Program. 175 , 461–502 (2019)
  • (SDDP) Pereira, M.V., Pinto, L.M.: Multi-stage stochastic optimization applied to energy planning. Math. Program. 52 (1–3), 359–375 (1991)

and a few other application articles.

My summary is:

  • All algorithms are based on a (finite) scenario tree that is built in advance before training.

If I’m drafting a paper to describe the inadequacy of the SDDP-like methods for multistage stochastic programming, I’ll definitely say

  • The scenario tree (discretization) model—as a bedrock of the algorithms—is a brutally inaccurate way to represent the real (maybe continuous) stochastic process.
  • The eventual training results do not directly constitute executable policies, since the real-world realization of uncertainties are not necessarily identical to one of the discretized states.

That’s why I said the question I raised in this topic is a “main concern”—your SDDP.jl also entails this discretization prerequisite.

The scenario tree (discretization) model—as a bedrock of the algorithms—is a brutally inaccurate way to represent the real (maybe continuous) stochastic process.

It’s a model. There are good models and bad models. It might have a large or a small impact on the quality of the final policy. Look up “SAA”. People have thought about this before.

The eventual training results do not directly constitute executable policies, since the real-world realization of uncertainties are not necessarily identical to one of the discretized states.

This is wrong. SDDP.jl gives you exactly an executable policy that you can evaluate “off-chain” or “out-of-sample” for realizations different than the realisations used during training. The entire reason I don’t like to use scenario tree in the context of SDDP is that SDDP is not solving a finite scenario tree. It is building a value function (with pros and cons) that can be used as a decision rule for new realisations that were never seen before.

If the uncertainty appears only in the right-hand side, then the value function is a valid lower relaxation of the true function you would have build if you had trained on the full distribution. If the uncertainty appears in the constraints or the objective, the value function is an approximation with no quality bounds.

2 Likes

I’ve noticed my assertion was wrong. But there is another important point you didn’t mention. I’ll point it out at the end of this post.


I’ve spent some time reading your doc Introductory theory · SDDP.jl. And I notice these two algorithmic options, the 2nd one is more important

1. Cut Generation—use the disaggregated style?

One thing I find is that you made use of aggregated cuts (this is standard, as written in the Pereira, M.V. 1991 SDDP paper). But if I were to implement it today, I would use disaggregated cuts as the scenario tree is fixed throughout the training
(
The aggregated cut is like

  • \theta \ge p_1 * (cut1) + p_2 * (cut2)

The disaggregated cut is like

  • \theta \ge p_1 \theta_1 + p_2 \theta_2 (fixed by the scenario tree) ; \qquad\theta_1 \ge cut1; \quad \theta_2 \ge cut2

)

2. Rolling Horizon should be adopted when we derive our policy at the primal side?

Another important thing I find you didn’t mention in your doc—so I speculate that you hadn’t implement—is you probably didn’t wrap your SDDP algorithm within a rolling horizon framework. (am I right?) Personally I think it is necessary. Here is the reason:

The MSSP scheme (say, 24 stages) solves a 24-hour economic dispatch problem with stagewise adaptability considered. Theoretically if the MSSP were solved to optimal, then I have an optimal policy that guide me how to make decisions during the next 24 hours, so I don’t need to fold the horizon and re-build another MSSP (23 stages) beginning at t = 2 and re-solve this new problem. But in practice there are two concerns:

  • the original 24-Stage SP was not solved to optimal due to time limit or other reasons like subproblem were not solved to optimal
  • I discretized the distribution of future uncertain trajectories at t=1 with a relatively coarser resolution.

Therefore the policy returned by the original 24-Stage SP built at t=1 was suboptimal. Therefore I still have a practical motivation to fold the horizon and re-solve a shorter-horizon MSSP starting at t=2 (and go along the rest stages in the same line of reasoning).

Moreover, the inherent spirit of MSSP is its adaptability at every stage as uncertainty being gradually disclosed. Building new MSSPs with shorter number of stages (e.g. 23-stage at t=2, 22-stage at t=3) can fully exploit the updated information at t=2, t=3. So the posterior distributions can be updated accordingly—we can build more precise uncertainty models for future stages.

If you find my “rolling horizon for MSSP” plausible. Then we have a bigger question

  • is 2SSP or MSSP preferable, when we are tackling multi-period decision problems under gradually disclosed uncertainty?

First, the core aim of both 2SSP and MSSP are to derive 1st-stage decisions here-and-now. Conceptually, the MSSP model is correct—corresponds to the real-life circumstance that we gradually adjust our decision/policy at each stage. Whereas the 2SSP model is not exact because it lacks nonanticipativity (NA) constraints. Therefore 2SSP naturally leads to suboptimal 1st-stage decisions. In this sense, 2SSP is inferior.

But when it comes to practical implementation, MSSP entails (maybe much) heavier simplifications in modeling of uncertainty. (And there are other reasons…) So the 1st-stage decision returned by an MSSP solution scheme is not necessarily better.

Finally, 2SSP and MSSP both need to be wrapped within a rolling horizon framework so that at each stage, only here-and-now decisions are really deployed.

From the standpoint of complexity/tractability, 2SSP is superior.

(Yes, I’m attempting to do some researches about this. Do you have some interests on these stochastic programming topics?)

On cut aggregation

SDDP.jl implements both the aggregated (single cut) and disaggregated (multi-cut) formulations. See Improve computational performance · SDDP.jl

They are two ends of a spectrum. In theory we could have partial aggregations as well. There are papers on that.

The value function in SDDP.jl is also slightly more complicated than the classical formulation. We mix single- and multi-cuts, as well as concave objective and belief states, and we also support convex risk measures for both the single- and multi-cut formulations. I don’t know if I’ve ever written out the full formulation in a paper.

On rolling horizon

This is a long topic.

You could easily wrap SDDP.jl in a larger rolling horizon algorithm. You could either build and train a new model from scratch each time-step, or you could re-train an existing model, starting from the new root state value. We don’t have great support for the latter, but it wouldn’t take much to add. (You can currently hack it by defining a new forward sampling scheme and modifying model.initial_root_state.)

It’s also worth pointing out you appear to be explaining a receding horizon model, where the time horizon reduces by 1 each time-step. That’s a better fit for re-training a finite horizon SDDP.jl model. A rolling horizon model (where we maintain the same length time horizon) can be re-trained using a cyclic policy graph.

First, the core aim of both 2SSP and MSSP are to derive 1st-stage decisions here-and-now.

I don’t agree with this. See [Research idea] Stopping rules · Issue #178 · odow/SDDP.jl · GitHub

  • is 2SSP or MSSP preferable, when we are tackling multi-period decision problems under gradually disclosed uncertainty?

Assuming you want to wrap these in a rolling horizon framework, and that you are going to take only the first-stage decisions, and that you are then going to simulate the policies out-of-sample… it depends on whether the two-stage approximation leads to good first-stage primal decision. It’s easy to come up with a model where the two-stage does well and another where the two-stage does badly.

I suggest you look up some talks by Warren Powell on YouTube. All these ideas are heuristics. The only way to evaluate their effectiveness is to build the policy and then test it in simulation. Rolling-horizon/receding-horizon, two-stage/multi-stage, risk-neutral/risk-averse, etc. They’re all valid approaches. It depends on the model. There are many other ways you can solve stochastic optimal control problems. I’d rather have a simple policy with a high quality simulation of the real problem than a complicated policy (rolling horizon SDDP) with a convexified approximation of the simulation model.

I read that Github issue, but not get a conclusion. Can you just give a brief reason?

Multistage adaptive decision making are just too hard to be solved to exact optimality, particularly with general continuous stochastic processes. So all algorithms involve approximation and thus only aim to find suboptimal policies.

A receding horizon SDDP—despite may seem too complicated at first for everyone—is a natural consequence of the primary goal to make best decisions at each time stage considering the latest updated information. If you just lazily build a single scenario tree (or your policy graph) at t=1 and not improve your modeling of uncertainty with the revealed info at t=2, you are breaching the inherent spirit of MSSP.

I read that Github issue, but not get a conclusion. Can you just give a brief reason?

See the first part of the issue:

  1. to obtain the (risk-adjusted) expected cost of the policy, e.g., in order to compare the operational cost of different investment decisions;
  2. to obtain a policy in the primal space, e.g., a decision rule at each node for when to release water through a hydro-electric turbine; and
  3. to obtain a policy in the dual space, e.g., marginal water values in the hydro-thermal scheduling problem.

I don’t always care about the first-stage primal decision.

Without an upper bound, the question of when to terminate SDDP-type algorithms is an open question in the literature.

I would stop the SDDP algorithm either due to time limit or once I find that no more violating cuts can be added. I believe the classic SDDP algorithm would only sample trajectories based on the finite scenario tree throughout the training. (It’s probably not correct to sample non-tree-based trajectories during training.)


I understand your meaning of that comment.

I think my original assertion can be revised to:

  • The core aim of both the 2SSP and MSSP scheme, at t=1 (when \xi_1 is realized), is to derive 1st-stage decisions here-and-now.

But in practice they need to be deployed here-and-now immediately. As for the 2nd-stage decision, that depends on future realization of uncertainty that we need to wait-and-see. That 2nd-stage decision is to be from the “here-and-now” decision of next SDDP model (a 23-stage SDDP built at t=2). So actually we only need first-stage decisions.

In the receding horizon framework, this means

  • even to simulating a single trajectory and calculate its associated cost realized, we need to build and solve multiple different SDDP models.

Close this topic.

The ChatGPT’s answer (i.e. “No”) in the #1 post is wrong, while my intuition therein was correct.
I read this paper today which expounds things well.

Some other important things I leant:

  • Multistage Stochatic Program ≠ Multistage Stochastic Programming. The latter is a scheme (e.g. typically we use a receding horizon method at the top level (also used in MPC)) whereas the former is merely a multistage problem formulation that is solved e.g. by an SDDP algorithm (e.g. the original 1991 paper).
  • The 1st-stage decision is of paramount importance in Stochastic Programming because only the 1st-stage have to be made here-and-now. All other decision (rules) are tentative that essentially aims to help us make the 1st-stage decision.