Starting to learn about optimization

I’m considering tackling a problem that appears to have a fixed handful of cutoffs and a performance metric, so I started reading a bit about optimization with essentially zero initial knowledge (not sure ordinary least squares counts for much). I’m reading about many tools, but I keep running into what I think to be important practical considerations that aren’t answered in one place. I’m not writing anything yet, but it could be helpful if I don’t have these questions hanging over my head:

  • These if-else cutoffs are not differentiable, to my understanding. They are not obvious step functions either, rather decisions in a loop. Does this mean smoothing approximations are not possible and I need to focus on derivative-free optimization?
  • I’m seeing vague hints of downsides to derivative-free optimization like poor scaling, slow convergence, and more likely non-global optima. What are some practical things I should know?
  • What do I do when out-of-sample testing suggests the model is bad? I’d just get the same result if I try again without changing anything.

If this doesn’t concern Julia packages much, I’ll move it to Off-Topic.

tackling a problem that appears to have a fixed handful of cutoffs and a performance metric,

What is the specific problem? I assume the “fixed handful of cutoffs” are constraints? Performance metric is probably the objective function. If you’re solving optimizations with constraints, then probably you need to use JuMP-like workflow (you need an underlying solver).

These if-else cutoffs are not differentiable

Maybe you can present the specific problem. To me it sounds like integer programming, e.g. in a unit commitment problem if the generator is ON, then it’s output power is e.g. [2MW, 10MW], if it’s OFF, then it’s output power is 0MW.

rather decisions in a loop

I didn’t get this either. What does loop here mean? Do you mean reinforcement learning (aka approximate dynamic programming)?

Does this mean smoothing approximations are not possible and I need to focus on derivative-free optimization?

No. There’re two layers, the outer layer is branch-and-bound, the inner layer is continuous optimization where you apply effective optimization methods. It’s not quite accurate to partition the methods into derivative-free methods or others. For example, the well-studied simplex algorithm for linear programming—it is not a typical derivative-driven algorithm, but it’s very effective.

downsides to derivative-free optimization like poor scaling, slow convergence, and more likely non-global optima.

You need to distinguish optimization problems and algorithms. The primary question is: is my optimization convex? If yes, then it is easy, otherwise pursuing global optimum is hard. If it’s a nonconvex differentiable nonlinear programming, then derivative-based methods are usable, which is based on descent direction and stepsize selection (But they may converge to local optimum, so they essentially don’t have any guarantee about the quality of the solution. Or, say, different local optimization algorithms are only relatively comparable). So the core issue is not about “using derivatives or not”.

What do I do when out-of-sample testing suggests the model is bad?

Are you doing stochastic optimization? In stochastic optimization, there are two models: the model of the uncertainty, and the model of physic system. For example, if you are solving a stochastic unit commitment, then you need a model for wind generation, maybe an ARIMA model for instance. This part is irrelevant to optimization techniques.