I’m trying to wrap my head around how I best should iterate over arrays. I’m coming to Julia after primarily using MATLAB and as both MATLAB and Julia are column major arrays should be accessed in the exact same order. My understanding from MATLAB is that I below have ordered the loops for optimal speed.

a = 1:25*50*75;
a = reshape(a, 25, 50, 75);
for k = 1:size(a, 3)
for j = 1:size(a, 2)
for i = 1:size(a, 1)
a(i, j, k) = i + j + k;
end
end
end

Should the loops be ordered the same way in Julia?

a = collect(1:25*50*75)
a = reshape(a, (25, 50, 75))
for k = 1:size(a, 3)
for j = 1:size(a, 2)
for i = 1:size(a, 1)
a[i, j, k] = i + j + k
end
end
end

ps: you can use for i in axes(a,1) for example, such that the loop is independent on how the array is indexed (for example, not starting at 1 - not common, but possible).

You can also use eachindex or CartesianIndices or pairs for this sort of thing, depending on the calculations you want to perform. They will choose the right order for you. You shouldn’t have to think about this too often.

In most cases, do

for i in eachindex(a)
a[i] = ...
end

If you need the actual indices, like in your example,

for ind in CartesianIndices(a)
a[ind] = sum(Tuple(ind))
end

These work for arrays of arbitrary dimensions.

There are several indexing functions that make working with arrays much nicer than in Matlab.

Besides what others have suggested, in Julia you can write this in a single line without worrying about column or row order and get the best perormance. This is possible using packages like Tullio:

using Tullio
a = collect(1:25*50*75)
a = reshape(a, 25, 50, 75)
@tullio a[i, j, k] = i + j + k

Tullio doesn’t do black magic, it should be equivalent to properly ordered loops, be sure you added -t auto as compiler option. Multihtreading should not do much here because this problem is essentially memory-bound rather than cpu-bound. I observe only 2.5x speedup on a 4-core/8-threads machine.

using Tullio
function do_loops!(a)
Threads.@threads for k in axes(a, 3)
for j in axes(a, 2)
for i in axes(a, 1)
a[i, j, k] = i + j + k
end
end
end
end
a = collect(1:25*50*75)
a = reshape(a, (25, 50, 75))
@btime do_loops!($a)
# 24.900 μs (0 allocations: 0 bytes) - Serial
# 10.600 μs (48 allocations: 5.08 KiB) - Parallel
@btime @tullio $a[i, j, k] = i + j + k
# 25.800 μs (0 allocations: 0 bytes)

@tturbo from LoopVectorization sometimes does magic in situations like these, because it has a allocation-free and fast threading mechanism (which can be found in Polyester).

@inbounds may also help, although the use axes guarantees that already, and thus it may not make any difference here.

I would also suggest that since you aren’t using the original values stored in a, you change the above Matlabish code to a more Julian form:

a = zeros(Int, 25, 50, 75)

or

a = Array{Int, 3}(undef, 25, 50, 75)

The first form is more convenient to write and initializes a to all zeros. The second form may be a bit faster and leaves the contents of a unspecified.

The strange thing is that when I run your code on my computer, I don’t get any improvement from the parallel version with Threads.@threads - in fact, sometimes worse - but I do get a significant improvement using Distributed@distributed. On my 8 core, 16 threads Windows laptop, Tullio is slowest, with time for @distributed < @tturbo < @tullio, each with about a factor 10 difference.

If you need i+1, j-1, etc. on the right-hand side, is this the suggested method?

for ind in CartesianIndices(a)
(i, j, k) = Tuple(ind)
a[ind] = (a[i+1, j, k] + a[i, j-1, k])/2
end

I don’t see pairs discussed in the array indexing documentation or blog post. I interpret the pairs docstring to mean it should be used if you need indices and values at the same time and that pairs is always preferred over enumerate. Is that correct?

This is going to fail, since i-1 and i+1 must necessarily go out of bounds at the edges. You’ll have to figure out some way to peel of the edge values.

pairs will give you the value and the index/indices. Enumerate will just count from one, so it depends on what you need. If you are looking for indices, use pairs.

eachindex and CartesianIndices are great, but often I have some computations that are unique only to one or two dimensions. And, so as not to repeat the computations on each iteration of the remaining dimensions, I need to explicitly loop over the array myself.