I am dealing with a multi-stage mixed-integer stochastic global supply chain optimization problem and want to solve it using the SDDiP algorithm using the SDDP.jl package in Julia.
In this regard, I tried various mixed-integer. programming and noticed that the SDDiP algorithm converges slowly with some solvers (solving stagewise subproblems) while converging faster (in fewer iterations) with other solvers.
Is it the accumulation of numerical errors from the solver over time, or are there other factors behind this observed phenomenon?
the random seed, and the sequence of random variable realizations on the forward pass
the tolerance to which you are solving the MIP problems
the presence of primal or dual degeneracy
what duality handler you are using. If LagrangianDuality there are tolerances and checks that depend on the quality of the MIP solution returned by the solver
I’ll add a note of caution: if you don’t fully understand how the algorithm is implemented in SDDP.jl and how the different solvers handle tolerances, it’s going to be quite difficult to make any sort of fair benchmark between solvers. This is particularly the case if you have only a small number of problem instances to compare.
Perhaps it’s useful to step back a bit: why are you trying to compare different MIP solvers in the context of SDDP.jl? What is the ultimate goal?
I was reading the documentation of solvers and the SDDP.jl algorithm. I realized that customization of solver parameters is imperative for comparison, but requires deep knowledge of both the problem at hand and the specific
solver, which might not be viable.
In this regard, I may ask, can I do hyperparameter optimization on the tolerance parameters of each of the solvers? If yes, what ranges can I utilize for the hyperparameter optimization algorithm?
I understand that hyperparameter optimization greatly increases the computational resource requirements, and that’s why I may ask, can I use deterministic problem instances (no uncertain parameters) for comparison? In this way, not only can the comparison be completed under limited computational resources, but also the influence of the SDDP.jl’s sampling procedure can be eliminated while investigating the performance of solvers in a decomposition setting.
I do not have any specific goal. I want to investigate solver performances (with respect to computational time, objective function value, number of iterations, etc.) in achieving the overall convergence in a decomposition setting while using the formulation at hand as a test bed.
In this regard, a variety of problem instances are available. Some instances have a larger number of binary integer variables, while others do not have any. Some contain general integer variables while others do not.
You probably just want to measure everything with the default settings and acknowledge the various factors you did not control for.
To overcome the random sampling issue, you could solve each problem multiple times and look at the various sample statistics (minimum/mean/maximum) when comparing.
Got it, so now I will solve each instance multiple times with uncertain parameter values set as minimum, mean, and maximum separately.
I may confirm specifically, can I use hyperparameter optimization (instead of default settings), and if yes, can I use it on just the optimality gap, and the optimality, feasibility, and integrity tolerance parameter while using their range provided in their GAMS documentation? Can I narrow that range to some extent?
There’s nothing in JuMP or SDDP.jl to help with hyperparameter optimization.
You’ll need to implement that manually. The parameters for the various tolerances and limits are solver dependent, and so too are their default values. You’ll need to consult the documentation of each solver.
I don’t recommend that you try hyper-parameter tuning the tolerances. They are not something that you should tweak to improve performance. They really dictate the quality of the solution that will be returned.
There are other other options you could experiment with, like the various duality handlers Integrality · SDDP.jl. There are some solver-specific options it might make sense to experiment with, like choosing the algorithm that the solver uses.
Instead of trying to tweak things to solve a single instance a bit faster, I would personally find it more useful to spend time focusing on the modeling of a problem. For example, what trade-offs did you make when approximating the real problem to fit into a linear program? How did you represent the uncertainty? Are more or fewer samples needed? Is there temporal dependence in the stochastic model? How do you measure out-of-sample performance of the policy? Is your decision maker risk averse? If so, by how much and what metric do they care about?
The scope for experimentation is large. Sure, it includes whether the CPLEX is faster with a feasibility tolerance of 1e-8 or 2e-8, but there are bigger questions to ask.
Got it. It will be much less computationally demanding to do hyperparameter optimization on discrete hyperparameters (duality handlers and choice of algorithm that solvers use, while ensuring that all the tolerance related parameters are below a certain threshold 10^-4). For hyperparameter tuning on these parameters, I am inclined to use deterministic instances to reduce the computational burden without intensely decreasing the reliability of the results.
I should analyze the effect of linear approximations and uncertainty representations in the linear program. Cost-benefit analysis on uncertainty consideration can be done.
For analysis on risk-averse decision making behavior, I have decided to use distributionally robust optimization.
Based on the nature of the formulation, I think temporal dependence is obvious but can I analyze its effect by just eliminating all the transition functions and using an initial value for all the in-going state variables at each stage?