How does SDDP provide decision for an out-of-sample state?

Hello!

I’m new to SDDP and Julia and I wonder how the policy evaluation works (math behind it) for states which are out of the scenario lattice which is used to train the algorithm?

For context: I’m solving the problem of asset management (the control variable is just the amount of long or short position in the asset, which is bounded) and optimizing the linear combination of expectation of the position cost on the latest stage and CVaR of it.
I’m training the algorithm on a scenario lattice I constructed which represents the costs of asset on each day with corresponding probabilities.
Then I want to evaluate it on historical data - the real path of asset, which prices can be out of sample (between nodes) of the scenario lattice.

The (!Note) in the documentation (Theory intro: Implementation: evaluating the policy) states that it can be out-of-sample:
" The random variable can be out-of-sample , i.e., it doesn’t have to be in the vector ΩΩ we created when defining the model! This is a notable difference to other multistage stochastic solution methods like progressive hedging or using the deterministic equivalent."

And it really does work when I implement it, but it doesn’t explain how. Does it simply take the nearest node from the lattice or can it evaluate different policies for any continious state? I want to get the math behind it.

Please let me know where I can read more about it?

Hi @petr-a, welcome the the forum :smile:

Are you talking about an example like Example: the milk producer · SDDP.jl?

If so, yes, it takes the nearest node in the lattice. This is one reason why it is limited to univariate random variables for now. In theory we could extend to more dimensions, it’s just more complicated to implement.

More generally:

  • We can trivially use any out-of-sample realization for the random variable within a node
  • Out-of-sample realizations for “off-chain” nodes are much harder
  • The simplest solution is to use the nearest neighbor
  • If your Markov state (e.g., price) is really continuous, then your value function will be a piecewise step function in the price dimension and piecewise linear convex in the state dimensions
  • You could imagine doing cleverer things than nearest neighbor. For example, you might want to linearly interpolate between adjacent nodes. That’s exactly what Objective states · SDDP.jl does. But it’s much more computationally challenging to train.
  • You could potentially imagine training using the nearest neighbor approach, and then post-processing the value functions to the linear interpolation format for simulation and evaluation, but SDDP.jl doesn’t support that yet.

Hi @odow ! Thank you for your response!

Yes, my task is similar to the milk producer example.
Thank you for the detailed explanation!
I’ll look onto the Objective states, I think nearest neighbor is fine to me, I will stick to it for now.

1 Like