Optimizing a very expensive function evaluated in C++

liuyxpp · June 13, 2020, 4:01am

I am moving to Julia. I have a function which is implemented in C++ and currently have no time to migrate it to Julia. It depends on lots of parameters. It typically takes 1 hour - 1 week to compute a single value of this function. Based on this function F, I would like to minimize the following function:

G(x1, x2) = f1(x1, x2) * F(x1) + f2(x1, x2) * F(x2)

The first order gradient of F is also output accompanying with evaluation of F. Thus the gradients of G, dG/dx1 and dG/dx2, are also available.

Does anyone has any experience on optimizing such funciton using as least evaluation of a function as possible? Any suggestions?

Oscar_Smith · June 13, 2020, 4:05am

Can you use an optimizer from Julia? BlackBoOptim.jl (https://github.com/robertfeldt/BlackBoxOptim.jl) might be a good option if you can. The benefit of it is that it can optimize non-julia functions since all it does is look at the outcomes for lots of different inputs.

baggepinnen · June 13, 2020, 6:14am

Since the function is so expensive, you probably do not want to use BlackBoxOptim.jl, they require a lot of evaluations. Maybe something like BayesianOptimization.jl would be better suited, they explicitly try to be smart about where to evaluate f spending some computations choosing good x values.

I’m not sure how you’d incorporate the gradient information though, although I can imagine how it could be done by constraining the gradient of the underlying gaussian process.

tim.holy · June 13, 2020, 7:42am

Since you have gradient information, and since it’s so hard-won, I’d strongly recommend you exploit it. A good choice might be Optim.jl. Since it sounds like you don’t have the Hessian, and automatic differentiation is off the table since it’s not written in Julia, I’d recommend quasi-Newton methods like BFGS. (I’m guessing x1 and x2 are scalars, but if not and if they are very large vectors, then L-BFGS would probably be a better choice.)

liuyxpp · June 13, 2020, 8:38am

Sounds good. I will try it out. BTW, x1 and x2 are scalars. Moreover, is it worth exporting the C++ function as a library. Currently I only have a binary to run and the results are saved to a log file. I have no idea what is the best practice of running C++ codes in Julia.

liuyxpp · June 13, 2020, 8:39am

Seems interesting. I will look into that too.

liuyxpp · June 15, 2020, 3:04am

I have tried BayesianOptimization using the parameters given in the README and example/branin_hartmann.jl. Both are poorly performed: the accuracy of the solution x1 and x2 is worse than 1e-2 for more than 1000 function evaluations ( I use a simplest function for my problem which has analytical expression). Noted that both x1 and x2 should be in the range of (0, 1).

On the other hand, Optim using BFGS and constrained box reaches accuracy of 1e-4 for only 10 function evaluations.

I also tried the TNC optimizer provided by scipy.optimize and it requires even less function evaluations than Optim to achieve same accuracy. I wonder is there any implementation of TNC (truncated newton bound constrained algorithm) in Julia? TNC of scipy is a wrapper of C function written by J.S. Roy (js@jeannot.org).

antoine-levitt · June 15, 2020, 8:17am

If F is a very expensive scalar function, I would maybe try something like the following: make an interpolation of F (eg a spline? Or possibly even a Lagrange-type interpolation if your function is very smooth. Also if you have derivative information for F you might be able to use that with Hermite interpolation) at the points where you have evaluated it. At this point minimizing G under that approximation should be quick (if f1 and f2 are not too bad), so do that with any method you like. Then evaluate F at this minimum, improve your interpolation, and repeat until convergence.

liuyxpp · June 15, 2020, 8:48am

I was doing that currently using simple spline interpolation. It seems the right way to go. I will try your suggestions on other interpolation techniques. Thanks!

BTW, what do you call this type of optimization? Two-step? Preconditioning? I am not working in the field of numerical analysis.

antoine-levitt · June 15, 2020, 8:55am

I think the fancy keyword for this is “surrogate”

jagot · June 15, 2020, 9:45am

You can get a surprising amount of dynamics out of a surrogate model, if your sample points are distributed wisely. I once used @baggepinnen’s https://github.com/baggepinnen/BasisFunctionExpansions.jl for this very purpose, works great!

liuyxpp · June 15, 2020, 11:27am

I have read the README of this package and it seems the interpolation works well when there are a lot of data points which is not the case in my problem since the data point is very expensive to compute.

jagot · June 15, 2020, 11:57am

You might be right about that, but I think that depends a lot on the smoothness of the underlying function. I think a bigger drawback of using RBFs would be that you are not using the gradient information, which you would with Hermite interpolating polynomials, as @antoine-levitt mentioned.

platawiec · June 15, 2020, 12:35pm

Given that you have a very expensive function and gradient information, I would recommend a “gradient-enhanced” Bayesian method marginalized over hyperparameters, instead of using a maximum a-posteriori estimate [1]. Unfortunately, I don’t think any package provides this (in Julia or elsewhere) and you would have to roll your own. I do have an example for the marginalization over hyperparameters, which I may take to posting at some point, but I’ve never seen an example of marginalization + gradient-enhancement.

Surrogates.jl has an assortment of different surrogates you may like to try first. You can get a feel for their performance by doing a train/test split on pre-generated data. Unfortunately, none of them accept gradient information at this time, though I have a hunch that you can append the gradient to the SVM or RandomForest surrogates to help with prediction. You can also get fancy and try writing a neural network to model your F with gradient information, though your mileage may vary.

Another option is to call the Surrogate Modeling Toolbox from Julia. This has an implementation of gradient-enhanced Kriging. (Kriging is another name for fitting a Gaussian process, which is the mathematical object at the center of Bayesian optimization)

[1] Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems , 4 , 2951–2959.

liuyxpp · June 15, 2020, 12:45pm

Great information! I am like a rabbit digging into a hole which seems now infinite deep I am not aware that this is such a deep and huge topic covering so much branches of numerical analysis…

liuyxpp · June 22, 2020, 2:40am

After some experiments, I found the cubic Hermite interpolation works great! Since there seems no implementation of this algorithm in Julia, I have created a package CubicHermiteSpline.jl by myself and it is now available in the public registry. A tutorial showing how to use and the performance of this interpolation algorithm can be found here. Thanks for your hints!

Topic		Replies	Views
Optimize Monte Carlo Simulation Optimization (Mathematical) question	4	425	March 24, 2024
[ANN] SurrogateModelOptim.jl Package Announcements optimization	3	1342	January 13, 2020
Approaches to find optimal X on large simulation model when you have only (X,Y) (no gradient, hessian,...) Optimization (Mathematical)	15	488	September 9, 2022
Optim: What optimiser is best if your gradient computation is slow? Optimization (Mathematical) optim	13	3154	August 22, 2017
Gradient-based optimization when gradient is only a function of y = f(x) Optimization (Mathematical) optim , optimization	3	244	May 1, 2024

Optimizing a very expensive function evaluated in C++

Related topics