Nonlinear optimization using MathOptInterface - examples/learning

(sorry for the long post)
Hi all,

For the past 6 months or so, on and off, I’ve been trying to estimate a finite horizon dynamic discrete choice model. If you are not familiar with it, suffices to say that for every step of the optimization problem, I need to solve a big, complicated and highly nonlinear model which involves Monte Carlo integration, to evaluate my likelihood function.

Now, I’ve been using the Optim package unsuccessfully. I’ve played around with LBGFS, Newton, automatic and finite differentiation, tweaking LineSearches and whatnot, and the estimation never converges. Of course, it could be a bug in my likelihood function, but I am 100% certain that the part that solves the model is correct, and there are 2 codes of the same estimation procedure written in Fortran and in Python that I’ve cross-checked line by line and the likelihood computation seems to be correct.

A colleague of mine said that maybe I should attempt using a non linear solver like Ipopt, and this type of problem seems to be infeasible to be put into JuMP, as it involves thousands of observations and around 30 parameters.

So, I’ve been looking into MathOptInterface, and the documentation is very dry. I am a newbie in programming in general, so the language in the documentation is quite abstract and difficult for me. I can implement the knapsack example, but when looking at the test files for nonlinear optimization example, I have no idea what is going on and can’t even run the code.

Is there any example of a nonlinear optimization example which uses a user-defined objective function and automatic differentiation instead of explicit gradients, and that can be run self-contained?

If not, what would be the best way to really understand what is going on with MathOptInterface? The documentation does not explain how to use user-defined functions, for example. How can I learn about this?

1 Like

Not a direct answer to your question but have you looked at NLOpt?

1 Like

You may want to consider and but you’ll likely have to code your own sparse Jacobians if your problem is very large. There’s a tutorial here: Tutorial · NLPModelsIpopt.jl. Both packages are used the same way.

1 Like

In general, these problems are tricky to make smooth and differentiable, which could pose a problem for the solver. AD usually fails, or even when it works it may not be very useful. Try a more robust, derivative-free optimizer, like Bayesian optimization, or particle swarm methods.

I assume you thought about this to, but ML is usually an ill-posed problem in structural estimation because of multiple sharp local peaks that show up in even small models.

What do you mean by this — the optimizer fails to converge?

We are working on something similar, and will make the repo public once the paper has been submitted (2–3 months). In the meantime, we fixed our estimation problem by giving up on AD, and making the problem smooth by all kinds of noise. Getting a small toy case (2 periods, one sector, etc) to work and eyeballing that helped a lot.

1 Like

I did not understand the structure of your problem, but I don’t think using MathOptInterface (MOI) directly over JuMP will improve the performance significantly. There should not be much of an overhead, and JuMP actually translates your problem to MOI anyway. It will also help with user-defined functions and AD, whereas in MOI you would have to (?) implement an NLPEvaluator that provides gradients (and hessians).

This story might be different if you have a model that needs to be resolved many times, reusing the same structure but changing some variable bounds etc. In that case, you might be able to use ParameterJuMP.jl or Parametron.jl (for quadratic programs?).

1 Like

I will take a look into them, it seems to be very flexible, thanks.

It never converges in the sense that when using Nelder-Mead or (L)BFGS, the estimator seems to be stuck at a local minimum, the gradient norm gets stuck around 1e-01 for as many as 5k steps. When using Newton, it either converges very quickly, but because the steps are not changing and not due to the Hessian being small enough, or it gets stuck like (L)BFGS.

ML has its problems, but this is a ‘simple’ model that has been successfully estimated in Fortran and Python in the past. I don’t know if this is because the nonlinear optimization suites in Julia are still not fully developed as those languages, but for this toy problem I am implementing, there shouldn’t be any major problems I think.

Thank you all for the suggestions. I’ll try my best to look into them, but I feel like going to Fortran is the way to go.

Yeah, it is meant largely as an interface to implement to connect DSLs like JuMP to backend solvers. I don’t think it is intended to be used direclty.

I suspect that AD won’t help in this case. It sounds like the function inside of your likeihood is just too complicated to differentiate with current tools (e.g. if you are running an optimization step at each point backward)

One thing that sometimes works are the stochastic ones, as other people said. Try GitHub - robertfeldt/BlackBoxOptim.jl: Black-box optimization for Julia

But the other thing to consider is using a commercial solver to see if it helps. From my experience, there are all sorts of heuristics and tuned parameters that can make commerical solvers dominate open-source ones. Maybe try GitHub - jump-dev/KNITRO.jl: Julia interface to the Artelys Knitro solver and and take a look at Example - KNITRO.jl to call it directly (i.e. not through JuMP)

I seem to remember that knitro is happy to give you evaluation licenses.

1 Like

MOI deals with a lot of additional complexity that likely isn’t relevant to what @cosmia is trying to accomplish, but it can definitely be used directly. The knapsack example is a good start. To do NLP on top of that, you need to construct and set an NLPBlockData, e.g., here. NLP-over-MOI would benefit from the analogue of a linprog-like interface where you can provide all the problem data and callbacks in one go like in NLPModelsIpopt.jl. I’ll leave that up for grabs.

Note that to get Ipopt to really shine, you’ll want to provide a Hessian matrix in addition to gradients.

You could call the Fortran optimizer from Julia with the function implemented in the latter, and check if that improves things. I don’t know which library you are using in Fortran, but perhaps there is already a Julia wrapper.

It could very well happen that the Julia optimizer libraries you are using are not as mature as in other languages. In this case, the authors usually welcome the problem as a test case, just open an issue.