Approaches to find optimal X on large simulation model when you have only (X,Y) (no gradient, hessian,...)

I have a large (i.e. 1 hour run time, 1 core) bio-economic simulation model and I want to find which combination of some inputs minimise the sum of this inputs given a fixed output of the model.
In practice, I want to find which economic policy, and in which region, minimise the policy cost given a fixed model output, even more concretely it is a bio-economic model of the forest sector, a government policy says that at the national level we should increase wood harvesting of X millions of cubic meters, and I need to find which policy is the cheapest.
So for example, if I put in place in the model a supply subside in region r this will increase harvesting in that region (and other effects in the other regions) , so it will if I subside forest investments…
The model is recursively dynamic, and give me lot of outputs.
I am interested only in (1) total harvesting across the timeframe at the national level, (2) total cost of the policies.
I have no idea that a certain specific combination of policies will give me the desired output, the desired harvesting levels. I can only put the input and observe, after 1 hour, the results.
My parameter space is about 2 policies, ~15 regions, 100 years.

The model is in C++ but the “control” I would put in place would be in Julia.

Of course if I could write my model analytically, I could use JuMP, pass it to a solver, and get the constrained optimisation, but here all I have is a black box (X,Y).

I don’t think I am the only one in this situation, but I don’t know how to look for approaches of this kind of problem…

You’re probably better off solving the unconstrained system |X - X_0| + Y where X_0 is the target.

With an objective function that slow, you may be best off with a simple optimiser such as a line search along each feature.

If you have only the function evaluation, and you don’t have any things like convexity, you’ll likely need some black-box optimizer.

There’s also things like GitHub - fmfn/BayesianOptimization: A Python implementation of global optimization with gaussian processes. in Python that are pretty easy to use and have reasonable results for low-dimensional problems.

1 Like

Sounds like the sort of problem that people doing heterogeneous agent macro models face, for which Fatih Guvenen came up with the TikTak algorithm. Tamas Papp implemented it along other similar methods here, maybe this is of interest to you:

2 Likes

Thank you all. Indeed this seems the right start.
I now need to understand if (1) I can pass these solvers a whatever function (i.e. a wrapper to a C++ model) on it needs to be a Julia function; (2) are these feasible with functions that run for 1 hour with hundreds of parameters space.

Maybe I could first run the space randomly for ~1 or 2 k simulations, get a table of inputs and outputs and then run a Machine Learning model to find an approximation of this function and minimise this approximated function (that would evaluate in fractions of seconds instead of hours)… that would make sense ?

I suggest you also take a look at NOMAD which is designed for expensive, black-box objective functions. The objective function does not have to be written in Julia.

2 Likes

Another suggestion is QuadDIRECT, “an algorithm for global optimization without requiring derivatives”.

1 Like

These are called surrogates, and there is

https://surrogates.sciml.ai/stable/

Which might help. Also the SPSA optimization method is designed for problems like yours. I think it’s in BlackBoxOptim.jl

1 Like

Thank you everyone, I have learn lot of things from this post…

How many parameters does this amount to?
For low-dimensional problems (less than 10 dimensions) where I have a good sense of the smoothness of the cost function BayesianOptimization works well for me. For higher-dimensional problems (30+ dimensions) with noise, I had most success with CMA-ES (I tried many alternatives, also mentioned in this thread…). Note that NLOpt also has some great derivative-free (e.g. DIRECT) and global optimization methods (e.g. MLSL). Which method works best depends a lot on your specific problem (no free lunch of optimization…).

1 Like

Well, without imposing restrictions (e.g. changing policy every 5 years, or same policy in group of regions) this would amount to 100*15*2 decision variables… for which I don’t know a prior any effect, for example I subside forest harvesting on region r1 but then this course less harvesting in regions r2, r3 etc,

[edited dimensionality info]

For 100\cdot15\cdot2 = 3000 decision variables I would definitely give CMA-ES a spin, especially if the evaluation of your cost function is not very expensive. My experience is consistent with the wikipedia statements about CMA-ES: it works well even in high dimensions, and “on non-separable functions that are ill-conditioned or rugged or can only be solved with more than 100n function evaluations, the CMA-ES shows most often superior performance.”.

There has been, however, some nice progress in scaling Bayesian Optimization, see e.g. the work of David Eriksson, e.g. TuRBO, which looks at least during early phases of training to be better than CMA-ES and other methods. But I haven’t played with it and I don’t know how sensitive it is to hyperparameter choices and how well it scales to more than 1000 dimensions.

EDIT: I noticed I mis-interpreted the dimensionality because of markdown formatting.

2 Likes

Many (most?) derivative-free optimization algorithms should be better than such a naive algorithm, as long as you have continuous parameters.

In addition to the packages mentioned above, NLopt.jl also has a number of derivative-free algorithms for both global and local optimization over continuous parameters.

If you have 3000 decision variables, evaluating your cost function takes 1 hour, and you don’t have derivatives, the answer is simple: black-box search is hopeless. You need to re-think your problem or learn how to express it in a differentiable way.

For example, you have some policy X for each of 100 years? Maybe you can assume that X is smoothly varying with time and characterize it by a straight line X(t) = X_0 + X_1 t with time t. That would reduce the number of variables by a factor of 50. (You might be able to similarly reduce the parameters for the different regions.)

4 Likes

In my experience, a good optimization algorithm (which zooms directly in on the optimal parameters) is always more efficient than training a surrogate model accurate over the whole parameter space. A surrogate model is only worth it if you are going to re-use it for lots of things (e.g. if you are going to re-do the optimization many times for different objectives.)

Oh, I missed the 1h in the original post. If you wanted 300’000 function evaluations for your blackbox-search, this would take you more than 34 years. Hopeless, indeed :wink: