Hobby project: SequentialFit.jl

I don’t know mathematical optimization, and this might be a tremendously dumb idea, but I couldn’t get it out of my head after working on a project where I was fitting a surrogate model to the outputs from a very expensive simulation. So I made this thing for fun:

Not going to register it or anything, but I’m happy for feedback/discussions/links to related papers.

Basically, if fitting, taking the gradient of your model with respect to its parameters and optimizing on that gradient are all orders-of-magnitude cheaper operations than sampling, it might make sense to use your current model to inform the choice of sample points. Big drawback: It requires you to already be fairly confident in the general quality of your model, so it probably doesn’t hold up for science.


And a picture because those drive engagement.



You might already be aware, but this is pretty similar to Bayesian Optimization.


I’ve encountered it, and it’s similar, but the big difference is that I impose a specific model function, and bayesian optimization (from what I understand) uses GPs, which are essentially polynomials.

In my use case, the purpose of the fit is to find specific, physically meaningful parameters, like linewidth, height and position of a gaussian. Extracting them from a GP seems difficult (but I’m not convinced I understand bayesian optimization enough to say for sure that this is not just a poor implementation of it).

Your code seems different to pure BO, but the idea of fitting a surrogate and optimizing a metric of that pops up in different areas.

You could also look up “Bobyqa”.


There is also GitHub - SciML/Surrogates.jl: Surrogate modeling and optimization for scientific machine learning (SciML).


The idea of this approach looks very interesting for fitting analytical physical models to expensive simulations, or to experimental data when one can actively control the next point to measure. It seems a more direct approach than going through some surrogate model and fitting an analytical function after that.
Could you please clarify what do you mean by:

Big drawback: It requires you to already be fairly confident in the general quality of your model, so it probably doesn’t hold up for science.


1 Like

Sure. Like I said, I haven’t investigated this thoroughly, but unlike Bayesian optimization it doesn’t actually act on uncertainty, but rather on “If my model is (close to) correct, where should I sample to get maximum information?”. If your initial guess of parameters is a bit off, there is no proper method to explore other regions (we have exploitation but not exploration if my opti-lingo is up to date).

Maybe there’s some good change I could make to improve the algorithm in this respect – I know for certain that adding the prod(s.X .- x)^2 factor was necessary to stop it from just sampling the same spot on what it thought were the slopes over and over.

In the example above we can see that it does a decent job, but I’m not certain I could trust it without checking against the ground truth. And I have no idea what kind of functions this would work for vs not.

Hm, I see… It relies on the general behaviour of the model function with the current parameter values, and if the reality is vastly different the point selection will be far from optimal.
Are you aware of any papers/studies of such methods? I tried googling some keywords but no success.

Nope, I was kind of hoping some would turn up here :stuck_out_tongue:

1 Like

This paper seems to propose a similar idea. The authors implemented it in the old Boeing Design Explorer software.

author = “{Audet, C.} and {J.E. Dennis} and {D. W. Moore} and
{A. J. Booker} and {P. D. Frank}”,
title = “A Surrogate-Model-Based Method for Constrained Optimization,
year = “2000”,
booktitle= “Eighth AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary
Analysis and


@mohamed82008, on the surrogates topic, the following slides: Introduction to Gaussian Process Surrogate Models, by Nicolas Durrande and Rodolphe Le Riche (2017), provide a really good overview.


This is wrong btw. GPs aren’t essentially polynomials. They are very different beasts. GPs are semi-parametric models in the sense that you need all the data around to do prediction with them, not just the parameters. GPs are naturally probabilistic and Bayesian which makes them an awkward choice to “fit” deterministic functions but it will work anyways if you ignore the variance. But that property of GPs also makes them seamlessly extend to stochastic optimisation where the objective or constraints can be random in nature.


This does not surprise me. I haven’t found a definition that relates them to anything I understand. I think there’s a couple of steps of reading I would have to do in between.