Which optimization package should I use?

Hello everyone,

I have an optimization (minimization) problem* in Julia to solve and I’m a looking for a package with which, most importantly, I can do it fast.

Specifically:

  • the problem is in two variables (but it might reach even about 10 in the future, as I will develop it),
  • the problem for these variables is on an a priori defined domain,
  • the problem has no constraints,
  • the objective function is (most likely) bounded,
  • the objective function is nonlinear,
  • the objective function has a subdomain (not precisely defined a priori) for which it is (only a posteriori) not defined. I was thinking of evading this problem in the minimization by setting on this subdomain an objective function to be very high compared to the values on the rest of the domain, e.g. =1000. For this reason, the function would not only be not differentiable, but also not continuous. If that helps, a priori I only know that in a rectangle domain of the two variables, the objective function is not defined on a bounded subdomain; if the rectangle was roughly split into two right-triangle-like subdomains, on the one of them the objective function is not defined; for certain in the one corner of the rectangle it is not, but the exact place of the hypotenuse (more generally: the curve rather than a straight line) is not known a priori. Also, the maximum of the objective function is most likely very close, just next to the line (curve) that is splitting the domain into these two subdomains (this “hypotenuse”), i.e. something similar to the tangent function close to \pi/2, but bounded. I’m however not sure if such a structure of the domain would also be the case if I extend the number of variables from two to, say, 5 or 10. Please, forgive my poor mathemetical description, I would do better if I knew how.

Now, I realize there are many packages and solution methods within them to solve the problem, and ideally I should experiment with all of them, but because just one evaluation of the objective function for a candidate pair of two variables takes about a minute, it would take ages to conduct such a meaningful experiment, in a situation in which I’m pretty time constrained. Moreover, I am not sure if the experiments performed with two variables, would be suggestive for the case when I would have to use more than two of them, say 5 or 10. For this reason, I’m looking for recommendations of a package as well as the method within it to be able to solve such a problem as fast as possible. I was thinking of something like, first, evaluating the objective function on (either a regular or a random) grid on the rectangle, and second, using Nelder-Mead with the initial guesses for the couple of such best gridpoints on the rectangle.

*Actually, the problem is the simulated method of moments estimation of a structural economic model. First, I try to estimate the two parameters, but later I will have to extend the set of estimated parameters.

Many thanks in advance! :slight_smile:

Edit 1:
Also,

  • My objective is a function of some statistics of a simulated economy, for which I don’t have any closed-form solutions.
  • I have a 6-physical-core laptop, so I can take advantage of parallelism.

Edit 2:
My question is very general and I don’t think it hinges on a particular code, but below is the general structure of the objective function that I would minimize in some range on θ₁ and θ₂.

function ObjectiveFunction(θ₁,θ₂)
    ...
    return (mₛ₁-mₕ₁)^2 + (mₛ₂-mₕ₂)^2 + (mₛ₃-mₕ₃)^2 # the number of terms here is at least two, also, they may come with some weights, ω
end

where the symbols come from the simulations that I perform for some parameters \Theta=(\theta_1,\theta_2), with the following structure:

  1. solve for the agents’ state variables’ policy functions,
  2. simulate the stochastic shocks for N agents over T periods,
  3. using 1. & 2., simulate the state variables for the N agents over T periods,
  4. compute some moments (statistics) based on these simulated state variables, then I minimize the quadratic form being the weighted combination of the differences of values of these simulated (s) moments with some empirical targets from the historical data (h) that I have, i.e. the objective function is of the form f(\Theta)=(m_{s,1}(\Theta)-m_{h,1})^2+(m_{s,2}(\Theta)-m_{h,2})^2.

When I say that I have no closed-form solution, I mean that I don’t have analytical/algebraic expression for m_{s,1}(\Theta) and m_{s,2}(\Theta), which I just take from the simulated values.

Have you seen the various related discussions here on Discourse. For example:

This discussion also contains a reference to a paper by Guvenen et al. that benchmarks optimizers for a number of problems, including economic models.

Also of interest: the comment here

that argues against Nelder Mead for the local.

3 Likes

If you can write your problem algebraically, you could (should) give IntervalOptimisation.jl a shot. You would get the global optimum robust to roundoff errors.

6 Likes

@hendri54 Before posting I had searched for similar topics and I’ve seen users asking questions on what best-suited package to use for their specific problem at hand. I just mimicked their behavior. Unfortunately, I hadn’t found threads that you posted, so many thanks for that, as I’ve learnt about some packages and methods. Yet, it still leaves me undecided about in which single package & method I should invest my limited time, perhaps as I’ve learnt about new possibilities, I am even less confident about any single solution now.

@cvanaret I don’t think so. My objective is a function of some statistics of a simulated economy, for which I don’t have any closed-form solutions. I edited the post to make it clear.

Got it.
Can you formalize your problem exactly? A “simulation” is vague, maybe we can formulate this as a set of constraints.

A (quite) technical remark regarding the fact that the function is not defined everywhere: classical gradient-based optimization solvers compute a sequence of iterates x^{(0)}, x^{(1)}, \ldots that converges towards a minimum. They usually come with a “globalization strategy” (line search or trust region) that restricts the length \alpha along a direction d^{(k)}. You could iteratively decrease this step length \alpha as long as the trial point x^{(k)} + \alpha d^{(k)} is in the undefined domain. Not sure which package would allow you to do that, though.

2 Likes

Hi @Piotr, I see it’s your first post, so welcome.

Try Optim.jl or NLopt.jl.

People could give more specific advice is you provide a minimal working example of your function. This post has some tips for writing a good post: PSA: make it easier to help you.

3 Likes

The simulation I perform means that for some parameters \Theta=(\theta_1,\theta_2), I:

  1. solve for the agents’ state variables’ policy functions,
  2. simulate the stochastic shocks for N agents over T periods,
  3. using 1. & 2., simulate the state variables for the N agents over T periods,
  4. compute some moments (statistics) based on these simulated state variables, then I minimize the quadratic form being the weighted combination of the differences of values of these simulated (s) moments with some empirical targets from the historical data (h) that I have, i.e. the objective function is of the form f(\Theta)=(m_{s,1}(\Theta)-m_{h,1})^2+(m_{s,2}(\Theta)-m_{h,2})^2.

When I say that I have no closed-form solution, I mean that I don’t have analytical/algebraic expression for m_{s,1}(\Theta) and m_{s,2}(\Theta), which I just take from the simulated values.

Hi @odow, nice to virtually meet you!

Before posting, I learnt about these two packages you suggested (and some others as well, like BlackBoxOptim.jl), so I considered using them. But also, following @hendri54 suggestion, I see that there are other packages Evolutionary.jl or MultistartOptimization.jl generally suited to the similar problems as mine. Hence, I would like to know which exactly I should best use, based on other users’ experience.

I didn’t include any minimal working example in my original post, as my question is very general and I don’t think it hinges on a particular code, but below is the general structure of the objective function that I would minimize in some range on θ₁ and θ₂.

function ObjectiveFunction(θ₁,θ₂)
    ...
    return (mₛ₁-mₕ₁)^2 + (mₛ₂-mₕ₂)^2 + (mₛ₃-mₕ₃)^2 # the number of terms here is at least two, also, they may come with some weights, ω
end

where the symbols come from the simulations that I perform, as I summarized them to few minutes ago to @cvaranet .

The simulation I perform means that for some parameters \Theta=(\theta_1,\theta_2), I:

  1. solve for the agents’ state variables’ policy functions,
  2. simulate the stochastic shocks for N agents over T periods,
  3. using 1. & 2., simulate the state variables for the N agents over T periods,
  4. compute some moments (statistics) based on these simulated state variables, then I minimize the quadratic form being the weighted combination of the differences of values of these simulated (s) moments with some empirical targets from the historical data (h) that I have, i.e. the objective function is of the form f(\Theta)=(m_{s,1}(\Theta)-m_{h,1})^2+(m_{s,2}(\Theta)-m_{h,2})^2.

When I say that I have no closed-form solution, I mean that I don’t have analytical/algebraic expression for m_{s,1}(\Theta) and m_{s,2}(\Theta), which I just take from the simulated values.

If I’m not mistaken your problem is exactly the type of problem that the Guvenen paper addresses (as his own papers are often heterogeneous agent models solved by MSM)?

1 Like

I think it is incorrect to say that you have an unconstrained problem. You basically always need to have constraints in SMM in these contexts, they just aren’t always explicit as they are in other types of problems. Like you say, for some parameter values your problem becomes undefined.

The real answer to your question is that there is no answer to your question. It is trial and error and depends on the shape of your objective function, cost of evaluating the objective function, etc…

The TikTak algo does look promising and Tamas have done us a favor by putting it together in GitHub - tpapp/MultistartOptimization.jl: Multistart optimization methods in Julia. I have used it with some success, although not yet in a full-scale estimation (hopefully soon).

2 Likes

Page 923 of this document discusses method of simulated moments estimation of a model, and gives an example of minimization using simulated annealing. It’s not necessarily the best approach, but it does work: Econometrics/econometrics.pdf at master · mcreel/Econometrics · GitHub

This is the file that does the MSM estimation: Econometrics/EstimateSV.jl at master · mcreel/Econometrics · GitHub

The code in that file that does MSM follows. It illustrates keeping random numbers in simulations fixed, so that the objective doesn’t move with each simulation, and a method for dealing with argument values that cause numeric problems.

# Estimation by MSM
S = 100 # number of simulation reps
shocks_u = randn(n+burnin,S) # used fixed shocks to control chatter
shocks_e = randn(n+burnin,S)
# define the moments
ms = θ -> m0' .- auxstat(θ, shocks_e, shocks_u)  # moment contributions: sample stats minus simulated stats
# do two-step GMM
# first stet
W = eye(size(m0,1))
m = θ -> vec(mean(ms(θ),dims=1)) # use average of moment contributions
obj = θ -> InSupport(θ) ? m(θ)'*W*m(θ) : Inf
θinit = (lb + ub)./2.0 
θhat, objvalue, converged, details = samin(obj, θinit, lb, ub; ns = 5, verbosity = 3, rt = 0.5)
# second step 
W = inv((1.0 + 1.0/S).*cov(ms(θhat))) # get the optimal weight matrix
obj = θ -> InSupport(θ) ? m(θ)'*W*m(θ) : Inf
θhat, objvalue, converged, details = samin(obj, θhat, lb, ub; ns = 5, verbosity = 3, rt = 0.5)
# compute the estimated standard errors and CIs
D = (Calculus.jacobian(m, vec(θhat), :central))
V = inv(D'*W*D)
se = sqrt.(diag(V))
2 Likes

As far as I can see, Guvenen’s et al. paper considers reduced form model of income processes with no optimization problems, which are the basis of my model (I don’t see why they can’t just use analytical formulas for GMM). But what they do later, i.e. simulating the paths of a panel of agents and minimizing the objective function, is what I also have to do with my MSM. So, yeah, the performance of their algorithm suggests that I should seriously consider Tamas’ package :slight_smile:

I haven’t thought of ex-ante undefined region of the parameter space for which the objective function is undefined as a constraint, but I should consider it. However, I don’t see how this relates to the computational problem that I face, especially the choice of the optimization algorithm. Even if we agree that my MSM problem has this implicit constraint, should I then instead of considering unconstrained optimization method, consider constrained optimization method? If so, how should I express this implicit constraint explicitly?

1 Like

One approach that I have used (described a bit more in the thread I linked to earlier) is a heuristic that combines a cheap/ approximate global search with Nelder-Mead for the local. Depending on the details of your problem, this may or may not help you.
The problem that I face is that global optimziation is too expensive. But then, most of the points in the parameter space that the global optimizer tries are “obviously not promising”. This is a subjective judgement on my part.

So the algorithm is:

  1. For 1 million quasi random points, compute a heuristic that decides how promising the points are. This is an approximation of the objective function that is cheap to compute.
  2. For the “best” 10k of those points: compute the objective.
  3. Run the local optimizer on the point with the best objective.

This only works if there is a cheap, approximate objective function that can be computed. In my case, the model has to do with human capital accumulation. I have strong priors that college increases human capital, but not by a big factor. That’s the kind of thing I use for the heuristic step.

In your case, infeasible points could be thrown out during the heuristic global search. With some luck, you don’t encounter too many infeasible points during the local search (Nelder Mead can handle those by returning a big value for the objective).

Drawbacks:

  1. No guarantee of global optimum. But I don’t think that can be guaranteed anyway in the kinds of problems that economists solve these days. I settle for reproducible good fit.
  2. Nelder Mead is (I am told) inefficient as a local solver. It could be replaced, but then it may not help you with those infeasible points.
1 Like