Nbabel nbody integrator speed up

I have made that splitting procedure long time ago, so it took me some time to remember how it works. It is indeed slightly cryptic :slight_smile:

Idea is simple though. We have following nested loops

for i in 1:N
  for j in i+1:N
    # Some operation
  end
end

and we are assuming, that operation inside this loop take the same time, irrelevant of index. Since each task has some time to get scheduled we want to have bare minimum of the tasks spawned. It leads us to the task of splitting external i loop in k chunks (where k is the number of threads), such that amount of operations in each chunk is equal.

Now, it can be solved exactly, but I am making another assumption, that N >> 1 and we can throw away discrete nature of i and change it to continuous variable x. Then number of internal operations for each x is just N - x (one can easily see that, taking into account linearity of the problem and the fact that when x = 0 number of elements in internal loop is N and when x = N number of elements is 0.

In this approach, number of operations is \int_{0}^{N} (N - x) dx = \frac{N^2}{2} and each chunk should contain \frac{N^2}{2k} elements. For chunk number m, number of elements in all chunks from 1 to m is given by

\int_0^{t_m} (N - x) dx = \frac{N^2}{2k}m

where t_m is the point where we should split another chunk.

After integration and substitution we obtain

N t_m - \frac{t_m^2}{2} = \frac{N^2}{2k}m

Solving this equation we get

t_m = N (1 \pm \sqrt{1 - \frac{m}{k}})

and in order to satisfy boundary conditions t_0 = 0, t_k = N we choose minus sign.

As a last and to be honest unnecessary step I change the definition of m to m \rightarrow k - m and the answer is

t_m = N (1 - \sqrt{\frac{m}{k}}), \forall{m = k..0}

Chunks are just intervals (t_m, t_{m - 1}]

Of course it can be done more accurate and this assumptions are more imbalanced for small numbers of N, but multithreading for small N is meaningless anyway and deviations related to scheduling and other things will nullify more exact solutions (or at least that’s what I believe in, could be wrong).

One small note, the acceleration loop is going to N instead of N-1 (line 107):

I do not quite get it, since it is written as a loop from 1 to N, I do not see any N - 1.

7 Likes