Genetic Programming for predicting equations of motion

Datseris · January 6, 2018, 11:58pm

Hello people!

We want to start a new bachelor project soon, with the goal of using Genetic Programming to somehow predict equations of motion from timeseries. There has been quite some work on this field, notably the publications

Distilling Free-Form Natural Laws from Experimental Data, M. Schmidt and H. Lipson, Science 2009
Automated reverse engineering of nonlinear dynamical systems, J. Bongard, H. Lipson, PNAS 2007
The book by Koza : “Genetic Programming: On the Programming of Computers by Means of Natural Selection”

I have found that this package exists: ExprOptimization.jl which is part of Stanford Intelligent Systems Laboratory and seems to be developed solely by @rcnlee.

I was wondering whether the community has more suggestions, or if the author (Richie Lee) has some comments on whether this package is a good place to start. My only fear is that ExprOptimization.jl is under-documented, which will make things hard for a bachelor student, or anyone who would like to expand the package, really.

In addition, the speed seems to be on the slow side as well, as I have been running the example given in the notebook for more than 10 minutes now and it still hasn’t finished computing. Of course I have to say that I have never used Genetic Programming before, so I do not have any clue on how fast these things should be!

In general we would like to create something that can be incorporated into the DynamicalSystems.jl ecosystem and the two goals we have are:

Predict system conserved quantities using Julia
Predict or at least indicate form that equations of motion might have, again using Julia.

P.S.: If this topic is better suited in another category, please do not hesitate to move it.

ChrisRackauckas · January 7, 2018, 12:05am

In systems biology there’s topological sensitivity analysis which you can look at.

http://www.pnas.org/content/111/52/18507.abstract

The senior author is a big proponent of Julia as well. If you’re interested, we can discuss building something in Julia based on this.

Instead of optimizing over expressions, what you can do instead is put together a more generic equation of method which has all of the terms you think you may need, and then do an optimization with a regularization which is chosen to sparsify the solution.

mohamed82008 · January 7, 2018, 12:53am

You might be interested in BlackBoxOptim.jl which has a nice collection of evolutionary algorithms.

SebastianM-C · January 7, 2018, 1:03pm

This reminded me of an article I saw about deducing the Maxwell Equations from experimental data (with Julia).
I am not sure how relevant is this since from what I understand it is more related to AI.
You can find more info here: [1708.04927] TheoSea: Marching Theory to Light, also

and in the video from JuliaCon 2017: JuliaCon 2017 | TheoSea: Marching Theory to Light | Mark Stalzer - YouTube

Datseris · January 7, 2018, 1:32pm

Thanks for your kind replies! @SebastianM-C I have also seen this talk, but they don’t seem to have any code anywhere.

@mohamed82008 Thank you for your suggestion. Unfortunately I simply could not understand what this package is about. The first paragraph of the readme left me utterly confused:

using BlackBoxOptim

function rosenbrock2d(x)
  return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
end
We can now call the bboptimize() function, specifying the function to be optimized (here: rosenbrock2d()) and the range of values allowed for each of the dimensions of the input:
res = bboptimize(rosenbrock2d; SearchRange = (-5.0, 5.0), NumDimensions = 2)
We get back an optimization result object that we can query to, for example, get the best fitness and solution candidate:
best_fitness(res) < 0.001        # Fitness is typically very close to zero here...
length(best_candidate(res)) == 2 # We get back a Float64 vector of dimension 2

What are they even optimizing?.. I see no definitinion of loss function or anything. I truly have no idea what the package does Reading their README even further I realized that this is not really related to what I am looking for. But I am really happy you tried to help!

SebastianM-C · January 7, 2018, 1:35pm

There is a Julia code in the Appendix of the PDF in the second link.

jonathanBieler · January 7, 2018, 1:43pm

It’s an optimization package, algorithms to find the minimum of functions, in the case the Rosenbrock function, which is a classical function to test optimization algorithms. In your project you probably want to write down some sort of objective function and optimize it, so that’s where an optimization package comes in.

That said Python’s implementation of CMA-ES is probably better than BlackBoxOptim algorithms for most problems.

mohamed82008 · January 7, 2018, 2:41pm

As @jonathanBieler said, the loss function is the Rosenbrock function. Loss function is just a code name for any function you are trying to minimize, doesn’t have to be some sum of square error, could be any function that you define as long as it is bounded from below (i.e. has some minimum). Think of the input x to the function above as the parameters of a regression model, and the function returns the equivalent of the sum of square error. Most of the unconstrained single objective evolutionary algorithms are actually pretty simple so it should be relatively easy to dive in deeper in the package, if you want, and then maybe make it faster. Or you can use the Python alternatives instead.

Speed depends on the number of variables and the computational cost of evaluating the objective. I think 10s of variables is probably a practical limit. 100s is really stretching it, but if your objective is so cheap to evaluate, then it is possible. Anyway, the algorithm can give you the best solution found after N iterations (or function evaluations) that you can specify, so you can control the time yourself, just not the quality of the solution.

Datseris · January 7, 2018, 3:20pm

Ok thanks. Unfortunately I do not have enough experience on the topic

You both suggested Python but the project has to be done in Julia. If you we cannot proceed with genetic programming we will find another project to implement in Julia.

EDIT:

Well that of course obviously understandable. But the example I run had only a single variable and 2 multiplications and additions. So I would say is pretty small. So do you have any estimate on “time to solve per iteration / per population size” ? In my laptop took 10 minutes, which I though was very slow, given that crazy complicated ANN models can take a couple minutes only.

mohamed82008 · January 7, 2018, 3:22pm

I recommend Julia! I think Julia has ALOT more potential in that area than Python.

Datseris · January 7, 2018, 3:24pm

This is what I run: Jupyter Notebook Viewer

ChrisRackauckas · January 7, 2018, 3:33pm

Note that DiffEqParamEstim.jl is a already setup with testing for NLopt.jl, BlackBoxOptim.jl, and Evolutionary.jl (its master branch… it really needs a tag). DiffEqParamEstim is documented here:

http://docs.juliadiffeq.org/latest/analysis/parameter_estimation.html

and if you want to see how it interacts with any of the packages, just check the tests

https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/tree/master/test/tests_on_odes

We did find that NLopt.jl performed the best in a wide class of problems, so far. I have hope for Optim .

While that shows that parameter estimation problem, you can just use the regularization

http://docs.juliadiffeq.org/latest/analysis/parameter_estimation.html#The-Regularization-Function-1

or build your own loss function

http://docs.juliadiffeq.org/latest/analysis/parameter_estimation.html#The-Loss-Function-1

to enforce sparsity constraints, in which case it’s then not just about predicting parameters, but predicting which parameters are zero as well and thus constraining the possible equations of motion from a starting set.
Note that these convenience functions autodiff through the solver and stuff like that to give gradients to the optimization algorithm.

Note we are building very similar Bayesian tools which may be helpful here too.

mohamed82008 · January 7, 2018, 3:38pm

10 minutes for 1 variable and 2 multiplications and additions is pretty darn slow. You can literally try every value the variable could take (with some small step size) if the problem is this simple and then just choose the best solution!

I never used ExprOptimization before but the example seems to be doing symbolic regression with an expression tree of maximum depth 10, that’s not one variable! In the worst case this could be up to 2^{10}-1 function nodes! I never did this before though, so I am not sure of the details of the package or the technique used.

Also, if you can get a cheap estimate of the derivative, then genetic algorithms is not the right choice, you should go for gradient-based methods like the ones in NLopt. I think ForwardDiff should make gradient based methods almost equally “black-box” as genetic algorithms (as long as the loss function is compatible with ForwardDiff), and if the loss function is sufficiently smooth, gradient-based methods could easily beat genetic algorithms, I think.

rcnlee · January 9, 2018, 2:32am

Hi George,

I am the main developer of ExprOptimization.jl. The package provides a number of algorithms to solve GP-style expression optimization problems for Julia expressions, so it is particularly well-suited for symbolic regression problems.

As you can see from the example notebook, it’s pretty quick to get started. You specify a grammar and a loss function and you’re good to go to try different algorithms. I think it’s quite suitable for a student project.

In addition to the example notebook, there is documentation in the source files. I am also still actively maintaining and working on the package, so you can always reach out to me or file an issue on github if there are any issues.

The algorithms actually spend the majority of the time evaluating the fitness function (as it should). You can verify this by timing the fitness function. The symbolic regression example in the notebook had some poorly chosen parameters. I’ve made some adjustments that make the example run a lot faster now, so please take a look again at the example. It should take less than half a minute to run now. In particular, (1) I sped up the fitness function by adjusting the discretization; (2) reduced the max_depth of the derivation trees searched; (3) reduced the number of evaluations of the fitness function by using smaller population sizes and iterations.

Feel free to contact me directly if you would like to discuss further. My email is on the repo’s wiki.

Ritchie

Datseris · January 9, 2018, 12:24pm

Hi Ritchie @rcnlee,

I really appreciate your reply. Reading your answer in detail, I feel much more confident in suggesting the mentioned project as one of the available projects to the student.

We also have a lot of ideas on how one can add more functionality than what already exists, for example the partial derivatives method used in the Science paper I mentioned as well as a completely novel application.

For more info, I will contact you as you suggested. I am also very happy that you are willing to help with issues on using the package!

Topic		Replies	Views
[ANN] SymbolicRegression.jl 1.0.0 - Distributed High-Performance Symbolic Regression in Julia Package Announcements package , symbolic-regression	24	1506	November 29, 2024
Packages for Optimization General Usage question , package , jump , optimization	22	1481	December 1, 2020
[ANN] DynamicExpressions.jl: Fast evaluation of runtime-generated expressions without compilation Package Announcements package , announcement , diffeq , symbolics , symbolic-regression	7	2053	October 1, 2024
[ANN] SymbolicRegression.jl - distributed symbolic regression Package Announcements package , announcement , symbolic	22	3868	March 13, 2025
Genetic algorithms General Usage	10	6009	December 18, 2018

Genetic Programming for predicting equations of motion

Related topics