Newton vs the machine: solving the chaotic three-body problem using deep neural networks

Numpy and Tensorflow. Bet they would have been better using Flux and Zygote!

1 Like

Apparently they are using a simulator for training the network. I wonder if this is in line with the current concept of ‘scientific machine learning’, which is often mentioned in connection with the Julia ecosystem. I don’t know if this ‘Brutus’ simulator is differentiable, but I presume that would make a big difference.

Could a Julia, autodiffable version of this conceivably be significantly better?

or Mind vs the muscle.

Color me skeptical of this paper. It’s perfectly sensible that you need arbitrary-precision arithmetic to resolve state space adequately for accurate computations of long trajectories of the three-body problem. But how can a trained approximation (presumably working with standard 64-bit floats) compete with the high-precision numerical integrator on accuracy? The test set must not be covering the super-fine fractal basin boundaries that the integrator is specifically designed to resolve.

I read the paper (quickly) and don’t see the asnwer to the issue.


Possibly more importantly is that the class of problems they actually were solving is 3 body problems in the plane with equal mass objects and no initial velocity. That probably reduces lots of the ill conditioning and allows for simpler math.

I actually mention an example exactly like this in my JuliaCon talk:

However, I guess the issue is that I was honest about what’s going on when you build a surrogate like this. What you’re really doing is you’re shifting compute time to be pre-performed so that you can get a real-time simulator of anything by evaluating the surrogate. To do this, you essentially have to put more time in than you would have otherwise, and the resulting surrogate will not be as accurate as your simulator, but the end result is something that will give instant solutions to anything. The example I give where this may be useful is drone flight, where you might have a complex model but want to compute optimal controls given sensor inputs in real-time. There’s no magic here, the compute still happens somewhere, but they buried in their paper that fact:

Generating these data required over 10 days of computer time.

So yeah, they gave themselves 10 days to generate the data to train a neural network that gets 4 digits correct on a simplified (2D) version of the problem. That’s shifting computing time, not necessarily accelerating anything. It is an interesting technique and can have a lot of uses, but it’s not new.

Nah, it would’ve been best to use Surrogates.jl:

Radial basis functions will perform as well or better than neural networks on many of these surrogate tasks, and they will be instant to “train” without requiring GPUs or anything!

If you look at Figure 6, the main reason is because the neural network isn’t as accurate as the numerical integrator. The raw neural net doesn’t get more than 4 digits correct.

Not really. It’s more parallel to just generate a point cloud of inputs and then train the neural net on the resulting data. Or just build a radial basis function which takes no training time and will get to the same place. Surrogates are a tool in scientific machine learning, but this isn’t close to the full story.

In the end, :man_shrugging: cool result, but you’d have to ask if you really require real-time solutions to the n-body problem before spending a ton of time pre-computing solutions, and a real application which requires real-time isn’t given. Otherwise… you just run the simulator once which apparently takes 10/10000 days, which is a 10,000x speedup over the “train a neural net” approach.


An area where people might want real time results is financial trading, where volatility, the unobserved variance of the asset returns, is of interest. One can use a net to learn the parameters and instantaneous variance of stochastic volatility models using artificial data (for which the unobserved instantaneous variance is in fact observable), conditioning on observable statistics of the artificial samples. Then the trained net can take real world statistics as an input to estimate parameters and volatility, almost instantly. Not surprisingly, this works pretty well. Not quite as accurate as other methods, but pretty accurate, and much faster. I have a little paper on the topic, which made few ripples :joy: I’m currently preparing an updated example using Flux, the previous examples used Mocha.jl or MXNet.jl, don’t recall now which.

edit: the SV example using Flux is at, in the SV directory


I’m not skeptical of the paper and it’s claims, but the way it’s being presented in the media really distorts it.

I work with a lot of people who are doing very similar things for very similar problems. In the use cases I’ve seen, you have a computationally expensive full physics model that’s being run all the time, each time with only very minor perturbations to the inputs. If these models can’t run in real time for some applications, it makes a lot of sense to just run it a whole lot in advance and use some sort of machine learning/statistical method in advance to get something like an approximate lookup table.

I would say somewhat unfortunately, Deep Learning is being the method that is being chosen for this all the time. I saw unfortunately because one problem with this is that it is really hard to tell if you are making predictions in an area with proper coverage, as compared to something like a radial basis function, or better yet Gaussian Processes. It’s not the end of the world to use Deep Learning in such cases, especially if you are convinced that your training samples should have proper coverage for application, though you are likely to require much more training data. You could make an argument that Deep Learning might be better than distance based methods if there are discontinuities in the target function. But in general, my experience is that Deep Learning is used without good justification.

From reading that paper, it doesn’t seem the authors claimed to be doing anything different. The media might as well have said that “Standard Normal lookup tables have done what hundreds of years of calculus could not”. Maybe sorta kinda true in a weird way, but mostly highly misleading.


Sorry for the dumb question (this is not my area of expertise), but isn’t this just a numerical demonstration of the universal approximation theorem?

Basically, yes. If one wanted to be really pedantic, the universal approximation theorem applies to a single layer network, while most of the time, Deep Learning is suggested which has lots of layers. It’s not to hard to apply the UPT to deep learning, as you just apply UPT to layers of the DL model and then you’re done.

Of course, NN/DL are not the only methods that can learn a generic function given enough data/parameters. Radial basis functions and Gaussian processes also fit the bill. So you should start to care about which methods use the data more efficiently; would you rather run your physics models 10^6 times or 10^9 times? As far as I know, there’s only heuristics to answer that kind of question at this time (but I could be wrong!). With the lack of any sort of proof, many researchers seem default to DL methods.


I really don’t have intuition or heuristics for which surrogate models are most applicable to particular problems, but I’d naively expect NN/DL to outperform a distance-based surrogate like an RBF or GP for this topic. The three-body problem is chaotic – within a few Lyapunov times, the final position of a configuration, whose input was equidistant between two training samples, will not be the mean of those training samples.

My impression of the generic surrogate dream is that one hopes, with appropriate regularization, that we could learn a well-behaved manifold which is sampled by the training set. In the three-body problem, the manifold must be incredibly complicated – exactly what you’d hope a big pile of dense layers should be able to represent.

1 Like

Game physics simulations are a more practical application of this shifting computing time trade-off.
Looks pretty cool: