Performance of for loop

Hi,
I want to run a for loop for matrix M with size (200,200,5). I have been told that for the sake of performance, it is better to loop from the outer to the inner (that is, dimension 3 → dim 2 → 1). But here the size of dimension 3 is much smaller than the other two. Is it still better to run from outer to the innermost? Is there any general rule that helps speed up for loops in Julia? Thanks.

1 Like

The lengths of the high level dimensions are not important, the pattern of elements in virtual memory is. That matters because CPUs typically read nearby or patterned data in hopes it will be loaded soon and more quickly than from main memory. From the Avoid Cache Misses section of this blog viralinstruction.com/posts/hardware/:

  • By spacial locality, I mean that you should access data from memory addresses close to each other. Your CPU does not copy just the requested bytes to cache. Instead, your CPU will always copy data in larger chunks called cache lines (usually 512 consecutive bits, depending on the CPU model).

The nonideal order of dimensions will still have a pattern and can be optimized by prefetching (also covered in that blog), but you still suffer from the aforementioned cache misses.

Note that for some algorithms, helper methods like eachindex are designed to iterate an n-dimensional array in the typically best order.

1 Like

Alternatively, you can think about re-ordering your dimensions to optimize for the access pattern you want. e.g. if you often access elements of dimension 3 together, maybe you should re-order it to be dimension 1.

But it’s hard to give specific advice without knowing more about the structure/purpose of your code.

The number one rule is to not use global variables — put your loops into functions, and pass data via parameters. Also, read the Julia performance tips.

Note that Julia arrays are column-major. This means you should iterate opposite of what you wrote, i.e.
I apparently was very confused when I wrote this. Edited for correctness (thanks @stevengj)

arr = rand(200,200,5)
for i in axes(arr, 3) # 1:5 is discouraged
    for j in axes(arr, 2)
        for k in axes(arr, 1)
            arr[k,j,i] += 3 # some computation
        end
    end
end

Alternatively you can also look into eachindex and CartesianIndices

No, you have it backwards. The optimal explicit loop order should be:

for k in axes(arr, 3), j in axes(arr, 2), i in axes(arr, 1)
   # ...
end

for column-major order, as @jgr wrote. (The first index is contiguous in memory.)

(This is the only real downside of column-major, in my mind — the loop ordering is unnatural for humans.)

1 Like

You are right of course and that’s also how I did it in my codes all the time. I don’t know how I was that confused earlier… :open_mouth:
*edits post ashamed and head shaking

1 Like

Perhaps for cultures that use columnar writing it is less unnatural.

1 Like