That doesn’t make much sense though. There’s nothing specific to time steppers that would be customized for parallelization outside of the array interface. It seems like there’s an abstraction break here where time steppers, parallel/unitful arrays, and fast multipole method summations are put together where in practice the parallel efficiency of the three do not necessarily require interwoven implementation in order to achieve maximal efficiency. Pulling the three apart would ultimately be more efficient due ot how it can interact with other parts of the ecosystem.

This looks great! I love the diverse range of supporting packages, but it seems like your docs are light on the scientific details (presumably you’re putting those in a paper).

Do you have tests of i.e. cluster evolution with respect to standard codes like gadget?

It’s not clear to me what your integrator is. What’s the scheme for time stepping?

While I agree, improving generic array abstractions seems like a very different and potentially much harder development effort to me.

Also, what are suggesting specifically? Should one use DArrays in combination with DiffEq for distributed ODE solving? If I just read the DiffEq docs, I would think that I shouldn’t go down this route.

GPUArrays.jl (CuArrays.jl), ArrayFire.jl, DistributedArrays.jl have been tested and work in various forms, where the last one is still not recommended for common use yet.

So, I guess I should develop my own custom array type then? Is this mentioned / explained somewhere in the docs? I think an example showcasing something like this would be great. (Maybe I’m just not aware of it?)

The code project started from Julia 0.7, in 2018. At that time, DifferentialEquations was not our preferred choice. However, I would add DiffEq as one of the solvers in future development.

Yes, I’m preparing a paper, which will be submitted in one or two weeks. All figures are generated by codes in examples/

In test/, we have auto-tests validating the algorithms. For example, the evolution of energy and momentum are compared with Gadget-2. (Gadget-2 need unix-like platform, thus the comparison is not operated by default)

We have an explicit Euler integrator and an KDK Leapfrog integrator, both supporting fixed time-step and adaptive time-step (enough for collisionless n-body problems).

Basically same with Gadget-2: Hilbert-Peano octree.

ParallelOperations is type-stable and fast. Data communication is realized in minimum cost. We have benchmarks in Benchmark/ (tens of seconds for one step of 10^6 particles). Optimization works on distributed-memories are in progress. The multi-threading method is still a problem, because LoopVecterization do not support user-defined structs.

Generally speaking, direct summation method (CPU) and tree method (CPU only) in AstroNbodySim is slower than Gadget-2. One reason is that we rebuild the tree in each step. There are lots of optimization works to do. Fortunately, the simple GPU implementation of direct summation method is comparably fast (when using Float32).

Our implementation of distributed simulation data is quite similar to DistributedArrays. Particle data are handled by StructArrays for efficient memory accessing.

Maybe I should just wait for the paper, but I’m curious if uncertainty propagation actually works? Maybe it’s just for small systems. Pure speculation: at kpc scales and long timescales, I would expect a particle to be destined to enter a particular halo, but once it enters the halo the N-body calculation becomes Monte Carlo. So the uncertainty on position should just be the radius of the halo?

uncertainty propagation is supported by Measurements.jl, which tracks all uncertainties during computation. I don’t fully understand your question, but if you have all uncertainty measurements in your initial conditions, you can get a reliable uncertainty prrdiction (assuming time integration is accurate), and differentiate the result over initial conditions.

A curiosity: in a typical simulation with periodic boundary conditions, how big (system size) and how many particles (10^6?) are we talking about?

Can these calculations be performed with a cutoff (great enough such that the gravitational potential is small yet much smaller than the system size?). How many time-steps/second is a good benchmark? (I’m curious how would the trees compare with cell lists with a sufficiently large cutoff if that is possible).

Note that the Measurements.jl follows linear error propagation theory, which is based on some assumptions: errors are normally distributed, the first derivative is enough to locally describe the variation of the function (simple functions like sin and cos already fail this condition locally), etc… If some of these assumptions don’t hold, you get basically garbage numbers out, which I think is what @xzackli was suggesting above

A curiosity: in a typical simulation with periodic boundary conditions, how big (system size) and how many particles (10^6?) are we talking about?

Dynamic simulation under periodic boundary conditions is not supported by now. It will be supported in cosmological simulations. In the case of galactic dynamics, we only support vacuum boundary conditions.

Can these calculations be performed with a cutoff (great enough such that the gravitational potential is small yet much smaller than the system size?)

Yes, the gravitational potential is softened, or in other words, we solve collisionless Boltzmann equations. Check sec 2.9 in Galactic Dynamics for details.

How many time-steps/second is a good benchmark?

See the sections about time integration in Gadget-2 paper

In the case of adaptive time-steps, \Delta t is determined by the accelerations in the last time-step

For fixed time-steps, it is reasonable to choose the smallest time-step in adaptive time-steps.