I am dealing with an infinite-horizon multi-stage mixed-integer linear stochastic global supply chain optimization problem and using SDDP.jl for its solution.
In this regard, after a significant amount of training, the lower bound stayed constant throughout prolonged training sessions, and the simulation value kept moving in circles, not even barely improving (as shown in the figures below). I have tried a number of parameter variations (playing with the black box dynamics), but in vain. I found that all of the parameter settings (different values of lower bound, maximum depth, terminate on cycles or not, etc.) I tried have exhibit the same behaviour.
What can be the cause, and can it be rectified? Which parameters that you suggest should I try to vary and be hopeful for some improvement in the algorithmic behaviour?
At the beginning of the training. At the end of the training.