Operating on many arrays of the same size: one loop, many loops, or broadcasting?

Dr.Merkwuedigliebe · March 27, 2020, 5:45pm

I am working on a physical-process simulation, set on a large spatial grid. Each cell of the grid contains a set variables representing the status of the model. Let’s say the variables are a, b, c, d, and e, stored as arrays A, B, C, D, and E (all with the same size). Now, I need to perform a bunch of element-by-element arithmetic with these variables. For example, a = b + d; c = a * d; e = 2 * b + 20 * e / a; etc.

In terms of performance, is it better to:
(i) use a single loop to perform all the arithmetic (involving all the different arrays) at each array index; OR
(ii) use multiple loops to run through the array index, with each loop containing one (or at least a subset) of the arithmetic operations; OR
(iii) use broadcasting (e.g. the dot syntax) all the way; OR
(iv) some other way(s) that I have not considered here?

Let’s assume that all loops are sequenced in the memory access-friendly column-major order. (Actually, this is the essence of my questions: Does going through in column-major order help in any way when the loop needs to access multiple arrays?)

I am new to Julia and not a computer scientist, and have yet to know/understand the details of how the compiler goes about things. Any ideas and suggestions would be very welcome. Thank you!

stevengj · March 27, 2020, 5:58pm

In general, writing a single loop will be the fastest and most flexible way to do things, and is probably the easiest to code as well if you are doing lots of operations like this. (In a function — don’t write nontrivial code in global scope!)

(The problem with trying to do array operations like A .= B .+ D followed by C .= A .* D is that each array operation will be a separate pass over the arrays, even though Julia fuses the loops within a single operation like A .= B .+ D. Doing separate loops, either manually or via broadcasting, is suboptimal both because of poor cache utilization and because you repeat the loop-index computations multiple times.)

Even for multiple arrays, you should try to loop through data sequentially in memory (i.e. column-major for Julia), because sequential memory access is better for your computer’s cache. (This has nothing to do with compilers.) For the same reason may get even better performance if you use an array of data structures that contain (a,b,c,d,e) rather than 5 separate arrays, since in general if you compute on (a,b,c,d,e) together you want to store them nearby in memory.

Dr.Merkwuedigliebe · March 27, 2020, 6:22pm

Thank you for your clear explanation! I have an improved understanding now!

Regarding your suggestion to use an array of data structures, I suppose that means utilising mutable struct?

Elrod · March 27, 2020, 6:22pm

Although five separate arrays (and similarly, Struct of Arrays layouts) are likely to be more SIMD-friendly. If the five separate arrays version can vectorize, it will probably be faster.

I wouldn’t worry about that yet if you’re just starting to learn about optimization.

EDIT:
I’d use immutable structs in the arrays.

Topic		Replies	Views
Should I use broadcasting or write the function directly for higher dimensional input? New to Julia question , broadcasting	2	360	December 19, 2022
Understanding major order performance when broadcasting in column vs row operations Performance question , array , benchmark	9	980	June 21, 2021
Performance of simple broadcasting operations with many arguments Performance performance , broadcast	15	1561	November 29, 2021
Access arrays in memory order, along columns New to Julia array , memory-allocation	13	1413	March 20, 2022
When should I write loops or vectorised calls? General Usage	17	1751	December 1, 2020

Operating on many arrays of the same size: one loop, many loops, or broadcasting?

Related topics