Hi,
I want to run a for loop for matrix M with size (200,200,5). I have been told that for the sake of performance, it is better to loop from the outer to the inner (that is, dimension 3 → dim 2 → 1). But here the size of dimension 3 is much smaller than the other two. Is it still better to run from outer to the innermost? Is there any general rule that helps speed up for loops in Julia? Thanks.
The lengths of the high level dimensions are not important, the pattern of elements in virtual memory is. That matters because CPUs typically read nearby or patterned data in hopes it will be loaded soon and more quickly than from main memory. From the Avoid Cache Misses section of this blog viralinstruction.com/posts/hardware/:
- By spacial locality, I mean that you should access data from memory addresses close to each other. Your CPU does not copy just the requested bytes to cache. Instead, your CPU will always copy data in larger chunks called cache lines (usually 512 consecutive bits, depending on the CPU model).
The nonideal order of dimensions will still have a pattern and can be optimized by prefetching (also covered in that blog), but you still suffer from the aforementioned cache misses.
Note that for some algorithms, helper methods like eachindex
are designed to iterate an n-dimensional array in the typically best order.
Alternatively, you can think about re-ordering your dimensions to optimize for the access pattern you want. e.g. if you often access elements of dimension 3 together, maybe you should re-order it to be dimension 1.
But it’s hard to give specific advice without knowing more about the structure/purpose of your code.
The number one rule is to not use global variables — put your loops into functions, and pass data via parameters. Also, read the Julia performance tips.
Note that Julia arrays are column-major. This means you should iterate opposite of what you wrote, i.e.
I apparently was very confused when I wrote this. Edited for correctness (thanks @stevengj)
arr = rand(200,200,5)
for i in axes(arr, 3) # 1:5 is discouraged
for j in axes(arr, 2)
for k in axes(arr, 1)
arr[k,j,i] += 3 # some computation
end
end
end
Alternatively you can also look into eachindex
and CartesianIndices
No, you have it backwards. The optimal explicit loop order should be:
for k in axes(arr, 3), j in axes(arr, 2), i in axes(arr, 1)
# ...
end
for column-major order, as @jgr wrote. (The first index is contiguous in memory.)
(This is the only real downside of column-major, in my mind — the loop ordering is unnatural for humans.)
You are right of course and that’s also how I did it in my codes all the time. I don’t know how I was that confused earlier…
*edits post ashamed and head shaking
Perhaps for cultures that use columnar writing it is less unnatural.