[ANN] AstroNbodySim.jl: Unitful and differentiable gravitational N-body simulation code in Julia

It is a package for scientific gravitational n-body simulations, with following features:

  • Compute with units
  • User-friendly
    • Well documented
    • Readable programming
    • Vectorized array operations
    • Dispatch on types for various simulation settings
    • Float16, Float32, Float64, Int128, BigFloat, Measurement, etc.
  • Cross-platform: Linux, Windows, MacOS. Easy to deploy
  • Hybrid Parallelism: multi-threading, distributed parallelism, GPU acceleration
  • Modularity and Versatility: 9 packages, designed for general purposes, highly extentable
  • Realtime visualzation (interactive)
  • Auto-test workflow

Documentation: Home · AstroNbodySim.jl

Here’s some figures and animations to demonstrate what AstroNbodySim.jl can do:

  1. Realtime visualization of simulations on GPU

  2. Galactic collision

  3. Uncertainty propagation

  4. Autodiff of background potential field

  5. User-difined pipeline: Tidal disruption event (TDE)

  6. Lagrange radii and scale radius

  7. Solar System

Package ecosystem:

Issues and PRs are welcomed!


Just wondering, is there a reason this isn’t using DifferentialEquations for the actual simulation?

  1. We aim for heavy-load Nbody problems (> 10^6 particles) on HPC, so that the parallelization algorithm has to be customized
  2. Same as above, we have many specific simulation steps for scientific goals
  3. About force solvers:
    • DifferentialEquations does not have a distributed octree force solver
    • Direct summation method does not need DifferentialEquations
    • DifferentialEquations might be integrated as a sub-solver for particle-mesh solvers in the future
  4. The code architecture is evolved from Gadget-2. (We have many functions designed for Gadget-2 users)

That doesn’t make much sense though. There’s nothing specific to time steppers that would be customized for parallelization outside of the array interface. It seems like there’s an abstraction break here where time steppers, parallel/unitful arrays, and fast multipole method summations are put together where in practice the parallel efficiency of the three do not necessarily require interwoven implementation in order to achieve maximal efficiency. Pulling the three apart would ultimately be more efficient due ot how it can interact with other parts of the ecosystem.


This looks great! I love the diverse range of supporting packages, but it seems like your docs are light on the scientific details (presumably you’re putting those in a paper).

  1. Do you have tests of i.e. cluster evolution with respect to standard codes like gadget?
  2. It’s not clear to me what your integrator is. What’s the scheme for time stepping?
  3. What’s the tree scheme? BH?

From taking a quick look, It seems like you are using Julia’s built-in distributed computing facilities. I am curious

  1. why you chose it over MPI,
  2. how well it works for you, and
  3. whether you plan to compare to MPI at some point.

While I agree, improving generic array abstractions seems like a very different and potentially much harder development effort to me.

Also, what are suggesting specifically? Should one use DArrays in combination with DiffEq for distributed ODE solving? If I just read the DiffEq docs, I would think that I shouldn’t go down this route.

GPUArrays.jl (CuArrays.jl), ArrayFire.jl, DistributedArrays.jl have been tested and work in various forms, where the last one is still not recommended for common use yet.

So, I guess I should develop my own custom array type then? Is this mentioned / explained somewhere in the docs? I think an example showcasing something like this would be great. (Maybe I’m just not aware of it?)


The code project started from Julia 0.7, in 2018. At that time, DifferentialEquations was not our preferred choice. However, I would add DiffEq as one of the solvers in future development.


Yes, I’m preparing a paper, which will be submitted in one or two weeks. All figures are generated by codes in examples/

  1. In test/, we have auto-tests validating the algorithms. For example, the evolution of energy and momentum are compared with Gadget-2. (Gadget-2 need unix-like platform, thus the comparison is not operated by default)
  2. We have an explicit Euler integrator and an KDK Leapfrog integrator, both supporting fixed time-step and adaptive time-step (enough for collisionless n-body problems).
  3. Basically same with Gadget-2: Hilbert-Peano octree.
1 Like
  1. Easier to understand and debug
  2. ParallelOperations is type-stable and fast. Data communication is realized in minimum cost. We have benchmarks in Benchmark/ (tens of seconds for one step of 10^6 particles). Optimization works on distributed-memories are in progress. The multi-threading method is still a problem, because LoopVecterization do not support user-defined structs.
  3. Generally speaking, direct summation method (CPU) and tree method (CPU only) in AstroNbodySim is slower than Gadget-2. One reason is that we rebuild the tree in each step. There are lots of optimization works to do. Fortunately, the simple GPU implementation of direct summation method is comparably fast (when using Float32).
1 Like

Our implementation of distributed simulation data is quite similar to DistributedArrays. Particle data are handled by StructArrays for efficient memory accessing.

I will mention this in docs.

Maybe I should just wait for the paper, but I’m curious if uncertainty propagation actually works? Maybe it’s just for small systems. Pure speculation: at kpc scales and long timescales, I would expect a particle to be destined to enter a particular halo, but once it enters the halo the N-body calculation becomes Monte Carlo. So the uncertainty on position should just be the radius of the halo?


uncertainty propagation is supported by Measurements.jl, which tracks all uncertainties during computation. I don’t fully understand your question, but if you have all uncertainty measurements in your initial conditions, you can get a reliable uncertainty prrdiction (assuming time integration is accurate), and differentiate the result over initial conditions.

See examples/01-binary.jl

It looks amazing. Where I can find information about your planes for the future?

See this comment and some associated discussion: Notebook on particle simulations - #3 by ChrisRackauckas

A curiosity: in a typical simulation with periodic boundary conditions, how big (system size) and how many particles (10^6?) are we talking about?

Can these calculations be performed with a cutoff (great enough such that the gravitational potential is small yet much smaller than the system size?). How many time-steps/second is a good benchmark? (I’m curious how would the trees compare with cell lists with a sufficiently large cutoff if that is possible).

Note that the Measurements.jl follows linear error propagation theory, which is based on some assumptions: errors are normally distributed, the first derivative is enough to locally describe the variation of the function (simple functions like sin and cos already fail this condition locally), etc… If some of these assumptions don’t hold, you get basically garbage numbers out, which I think is what @xzackli was suggesting above


Thanks! We will publish a paper describing the code and future works soon. The plans will be also updated on GitHub afterwards.


I wish you good luck. I will try to watch this project regular, to stay on board with developments.

1 Like

Where I can find information about your planes for the future?

Thanks! Check them here:

1 Like

A curiosity: in a typical simulation with periodic boundary conditions, how big (system size) and how many particles (10^6?) are we talking about?

Dynamic simulation under periodic boundary conditions is not supported by now. It will be supported in cosmological simulations. In the case of galactic dynamics, we only support vacuum boundary conditions.

Can these calculations be performed with a cutoff (great enough such that the gravitational potential is small yet much smaller than the system size?)

Yes, the gravitational potential is softened, or in other words, we solve collisionless Boltzmann equations. Check sec 2.9 in Galactic Dynamics for details.

How many time-steps/second is a good benchmark?

See the sections about time integration in Gadget-2 paper

  • In the case of adaptive time-steps, \Delta t is determined by the accelerations in the last time-step
  • For fixed time-steps, it is reasonable to choose the smallest time-step in adaptive time-steps.
1 Like