Performance of for loop

jgr · December 29, 2024, 2:46am

Hi,
I want to run a for loop for matrix M with size (200,200,5). I have been told that for the sake of performance, it is better to loop from the outer to the inner (that is, dimension 3 → dim 2 → 1). But here the size of dimension 3 is much smaller than the other two. Is it still better to run from outer to the innermost? Is there any general rule that helps speed up for loops in Julia? Thanks.

Jeff_Emanuel · December 29, 2024, 2:49am

Benny · December 29, 2024, 6:55am

The lengths of the high level dimensions are not important, the pattern of elements in virtual memory is. That matters because CPUs typically read nearby or patterned data in hopes it will be loaded soon and more quickly than from main memory. From the Avoid Cache Misses section of this blog viralinstruction.com/posts/hardware/:

By spacial locality, I mean that you should access data from memory addresses close to each other. Your CPU does not copy just the requested bytes to cache. Instead, your CPU will always copy data in larger chunks called cache lines (usually 512 consecutive bits, depending on the CPU model).

The nonideal order of dimensions will still have a pattern and can be optimized by prefetching (also covered in that blog), but you still suffer from the aforementioned cache misses.

Note that for some algorithms, helper methods like eachindex are designed to iterate an n-dimensional array in the typically best order.

stevengj · December 29, 2024, 1:43pm

Alternatively, you can think about re-ordering your dimensions to optimize for the access pattern you want. e.g. if you often access elements of dimension 3 together, maybe you should re-order it to be dimension 1.

But it’s hard to give specific advice without knowing more about the structure/purpose of your code.

The number one rule is to not use global variables — put your loops into functions, and pass data via parameters. Also, read the Julia performance tips.

abraemer · December 29, 2024, 3:22pm

~~Note that Julia arrays are column-major. This means you should iterate opposite of what you wrote, i.e.~~
I apparently was very confused when I wrote this. Edited for correctness (thanks @stevengj)

arr = rand(200,200,5)
for i in axes(arr, 3) # 1:5 is discouraged
    for j in axes(arr, 2)
        for k in axes(arr, 1)
            arr[k,j,i] += 3 # some computation
        end
    end
end

Alternatively you can also look into eachindex and CartesianIndices

stevengj · December 29, 2024, 5:34pm

No, you have it backwards. The optimal explicit loop order should be:

for k in axes(arr, 3), j in axes(arr, 2), i in axes(arr, 1)
   # ...
end

for column-major order, as @jgr wrote. (The first index is contiguous in memory.)

(This is the only real downside of column-major, in my mind — the loop ordering is unnatural for humans.)

abraemer · December 29, 2024, 6:09pm

You are right of course and that’s also how I did it in my codes all the time. I don’t know how I was that confused earlier…
*edits post ashamed and head shaking

rocco_sprmnt21 · January 1, 2025, 2:06pm

Perhaps for cultures that use columnar writing it is less unnatural.

Topic		Replies	Views
Access arrays in memory order, along columns New to Julia array , memory-allocation	13	1484	March 20, 2022
The columns and the rows - puzzled about performance Performance question	8	910	October 14, 2021
Why column major? General Usage question , array , linearalgebra , column-major	59	19313	February 17, 2024
Row and column major order for arrays of different shape Performance column-major , row-major	23	2714	April 12, 2022
Something faster than for loops General Usage	27	6360	May 8, 2019

Performance of for loop

Related topics