Nbabel nbody integrator speed up

Elrod · December 12, 2020, 7:10pm

You could. HybridArrays may give you the best of both worlds, by making them N x 3. A slice A[n,:] should still return an SVector. If everything inlines, it should still be able to SIMD across loop iterations, while giving you the convenience of expressing some operations on the vectors instead making everything loops.

Another factor to consider is unrolling. In the example with N x 4 vs 4 x N, the N x 4 case will SIMD and be 4x unrolled (one per column). The 4 x N will only do a single operation per loop iteration.

Theoretically, when you don’t have dependencies (i.e, s += x[i], where each iteration depends on the previous), your CPU should be able to execute different loop iterations in parallel via out of order processing + speculative execution, but in practice, I normally find some unrolling tends to help.
Maybe it’s because of better out of order, or maybe it’s because of better density of relevant instructions, vs things like incrementing and checking loop counters.

Topic		Replies	Views
Speeding up some non-optimized Julia functions Performance	47	1162	July 19, 2022
Applying performance tips in library Performance	0	318	July 28, 2021
Very slow execution time in comparison even to Python Performance	35	2009	October 29, 2020
Any good OpenCL examples to demonstrate a speedup? GPU	4	1378	April 3, 2019
Improving Barnes-Hut n-body simulation performance Performance	14	2349	September 19, 2020

Nbabel nbody integrator speed up

Related topics