I am moving to Julia. I have a function which is implemented in C++ and currently have no time to migrate it to Julia. It depends on lots of parameters. It typically takes 1 hour - 1 week to compute a single value of this function. Based on this function F, I would like to minimize the following function:
Can you use an optimizer from Julia? BlackBoOptim.jl (https://github.com/robertfeldt/BlackBoxOptim.jl) might be a good option if you can. The benefit of it is that it can optimize non-julia functions since all it does is look at the outcomes for lots of different inputs.
Since the function is so expensive, you probably do not want to use BlackBoxOptim.jl, they require a lot of evaluations. Maybe something like BayesianOptimization.jl would be better suited, they explicitly try to be smart about where to evaluate f spending some computations choosing good x values.
I’m not sure how you’d incorporate the gradient information though, although I can imagine how it could be done by constraining the gradient of the underlying gaussian process.
Since you have gradient information, and since it’s so hard-won, I’d strongly recommend you exploit it. A good choice might be Optim.jl. Since it sounds like you don’t have the Hessian, and automatic differentiation is off the table since it’s not written in Julia, I’d recommend quasi-Newton methods like BFGS. (I’m guessing x1 and x2 are scalars, but if not and if they are very large vectors, then L-BFGS would probably be a better choice.)
Sounds good. I will try it out. BTW, x1 and x2 are scalars. Moreover, is it worth exporting the C++ function as a library. Currently I only have a binary to run and the results are saved to a log file. I have no idea what is the best practice of running C++ codes in Julia.
I have tried BayesianOptimization using the parameters given in the README and example/branin_hartmann.jl. Both are poorly performed: the accuracy of the solution x1 and x2 is worse than 1e-2 for more than 1000 function evaluations ( I use a simplest function for my problem which has analytical expression). Noted that both x1 and x2 should be in the range of (0, 1).
On the other hand, Optim using BFGS and constrained box reaches accuracy of 1e-4 for only 10 function evaluations.
I also tried the TNC optimizer provided by scipy.optimize and it requires even less function evaluations than Optim to achieve same accuracy. I wonder is there any implementation of TNC (truncated newton bound constrained algorithm) in Julia? TNC of scipy is a wrapper of C function written by J.S. Roy (js@jeannot.org).
If F is a very expensive scalar function, I would maybe try something like the following: make an interpolation of F (eg a spline? Or possibly even a Lagrange-type interpolation if your function is very smooth. Also if you have derivative information for F you might be able to use that with Hermite interpolation) at the points where you have evaluated it. At this point minimizing G under that approximation should be quick (if f1 and f2 are not too bad), so do that with any method you like. Then evaluate F at this minimum, improve your interpolation, and repeat until convergence.
I was doing that currently using simple spline interpolation. It seems the right way to go. I will try your suggestions on other interpolation techniques. Thanks!
BTW, what do you call this type of optimization? Two-step? Preconditioning? I am not working in the field of numerical analysis.
I have read the README of this package and it seems the interpolation works well when there are a lot of data points which is not the case in my problem since the data point is very expensive to compute.
You might be right about that, but I think that depends a lot on the smoothness of the underlying function. I think a bigger drawback of using RBFs would be that you are not using the gradient information, which you would with Hermite interpolating polynomials, as @antoine-levitt mentioned.
Given that you have a very expensive function and gradient information, I would recommend a “gradient-enhanced” Bayesian method marginalized over hyperparameters, instead of using a maximum a-posteriori estimate [1]. Unfortunately, I don’t think any package provides this (in Julia or elsewhere) and you would have to roll your own. I do have an example for the marginalization over hyperparameters, which I may take to posting at some point, but I’ve never seen an example of marginalization + gradient-enhancement.
Surrogates.jl has an assortment of different surrogates you may like to try first. You can get a feel for their performance by doing a train/test split on pre-generated data. Unfortunately, none of them accept gradient information at this time, though I have a hunch that you can append the gradient to the SVM or RandomForest surrogates to help with prediction. You can also get fancy and try writing a neural network to model your F with gradient information, though your mileage may vary.
Another option is to call the Surrogate Modeling Toolbox from Julia. This has an implementation of gradient-enhanced Kriging. (Kriging is another name for fitting a Gaussian process, which is the mathematical object at the center of Bayesian optimization)
[1] Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems , 4 , 2951–2959.
Great information! I am like a rabbit digging into a hole which seems now infinite deep I am not aware that this is such a deep and huge topic covering so much branches of numerical analysis…
After some experiments, I found the cubic Hermite interpolation works great! Since there seems no implementation of this algorithm in Julia, I have created a package CubicHermiteSpline.jl by myself and it is now available in the public registry. A tutorial showing how to use and the performance of this interpolation algorithm can be found here. Thanks for your hints!