Access arrays in memory order, along columns

fredrikpaues · February 25, 2022, 2:26pm

I’m trying to wrap my head around how I best should iterate over arrays. I’m coming to Julia after primarily using MATLAB and as both MATLAB and Julia are column major arrays should be accessed in the exact same order. My understanding from MATLAB is that I below have ordered the loops for optimal speed.

a = 1:25*50*75;
a = reshape(a, 25, 50, 75);
for k = 1:size(a, 3)
    for j = 1:size(a, 2)
        for i = 1:size(a, 1)
            a(i, j, k) = i + j + k;
        end
    end
end

Should the loops be ordered the same way in Julia?

a = collect(1:25*50*75)
a = reshape(a, (25, 50, 75))
for k = 1:size(a, 3)
    for j = 1:size(a, 2)
        for i = 1:size(a, 1)
            a[i, j, k] = i + j + k
        end
    end
end

lmiq · February 25, 2022, 2:28pm

Yes.

ps: you can use for i in axes(a,1) for example, such that the loop is independent on how the array is indexed (for example, not starting at 1 - not common, but possible).

DNF · February 25, 2022, 3:37pm

You can also use eachindex or CartesianIndices or pairs for this sort of thing, depending on the calculations you want to perform. They will choose the right order for you. You shouldn’t have to think about this too often.

In most cases, do

for i in eachindex(a)
    a[i] = ...
end

If you need the actual indices, like in your example,

for ind in CartesianIndices(a)
    a[ind] = sum(Tuple(ind))
end

These work for arrays of arbitrary dimensions.

There are several indexing functions that make working with arrays much nicer than in Matlab.

Seif_Shebl · February 26, 2022, 12:20pm

Besides what others have suggested, in Julia you can write this in a single line without worrying about column or row order and get the best perormance. This is possible using packages like Tullio:

using Tullio
a = collect(1:25*50*75)
a = reshape(a, 25, 50, 75)
@tullio a[i, j, k] = i + j + k

photor · February 26, 2022, 1:23pm

This Tullio version takes only 26.500 μs on my machine, while the multi-threaded version

Threads.@threads for k in axes(a, 3)
    for j in axes(a, 2)
        for i in axes(a, 1)
            a[i, j, k] = i + j + k
        end
    end
end

takes 2.954 ms. How can Tullio achieve that speed?

Seif_Shebl · February 26, 2022, 1:41pm

Tullio doesn’t do black magic, it should be equivalent to properly ordered loops, be sure you added -t auto as compiler option. Multihtreading should not do much here because this problem is essentially memory-bound rather than cpu-bound. I observe only 2.5x speedup on a 4-core/8-threads machine.

using Tullio

function do_loops!(a) 
    Threads.@threads for k in axes(a, 3)
        for j in axes(a, 2)
            for i in axes(a, 1)
                a[i, j, k] = i + j + k
            end
        end
    end
end

a = collect(1:25*50*75)
a = reshape(a, (25, 50, 75))
@btime do_loops!($a) 
  # 24.900 μs (0 allocations: 0 bytes)   - Serial
  # 10.600 μs (48 allocations: 5.08 KiB) - Parallel
@btime @tullio $a[i, j, k] = i + j + k
  # 25.800 μs (0 allocations: 0 bytes)

photor · February 26, 2022, 1:46pm

I see. I forgot to put the for loops in a function.

lmiq · February 26, 2022, 2:31pm

@tturbo from LoopVectorization sometimes does magic in situations like these, because it has a allocation-free and fast threading mechanism (which can be found in Polyester).

@inbounds may also help, although the use axes guarantees that already, and thus it may not make any difference here.

PeterSimon · February 26, 2022, 5:56pm

I would also suggest that since you aren’t using the original values stored in a, you change the above Matlabish code to a more Julian form:

a = zeros(Int, 25, 50, 75)

or

a = Array{Int, 3}(undef, 25, 50, 75)

The first form is more convenient to write and initializes a to all zeros. The second form may be a bit faster and leaves the contents of a unspecified.

Klaas_Pauly · March 1, 2022, 11:42am

Multithreading should not do much here

The strange thing is that when I run your code on my computer, I don’t get any improvement from the parallel version with Threads.@threads - in fact, sometimes worse - but I do get a significant improvement using Distributed @distributed. On my 8 core, 16 threads Windows laptop, Tullio is slowest, with time for @distributed < @tturbo < @tullio, each with about a factor 10 difference.

Nathan_Boyer · March 1, 2022, 4:10pm

If you need i+1, j-1, etc. on the right-hand side, is this the suggested method?

for ind in CartesianIndices(a)
    (i, j, k) = Tuple(ind)
    a[ind] = (a[i+1, j, k] + a[i, j-1, k])/2
end

I don’t see pairs discussed in the array indexing documentation or blog post. I interpret the pairs docstring to mean it should be used if you need indices and values at the same time and that pairs is always preferred over enumerate. Is that correct?

DNF · March 1, 2022, 4:38pm

This is going to fail, since i-1 and i+1 must necessarily go out of bounds at the edges. You’ll have to figure out some way to peel of the edge values.

pairs will give you the value and the index/indices. Enumerate will just count from one, so it depends on what you need. If you are looking for indices, use pairs.

fredrikpaues · March 20, 2022, 5:58am

eachindex and CartesianIndices are great, but often I have some computations that are unique only to one or two dimensions. And, so as not to repeat the computations on each iteration of the remaining dimensions, I need to explicitly loop over the array myself.

photor · March 20, 2022, 8:45am

I am new to Distributed, so could you give an example code about the problem in the OP? Thanks.

Topic		Replies	Views
Array indexing order vs dimension number General Usage	6	496	September 19, 2018
How to best iterate over k-dimensional collection, e.g. (n,k) matrix? New to Julia question	3	1911	December 30, 2022
For Loop Style Question New to Julia loops	6	770	May 27, 2020
Operating on many arrays of the same size: one loop, many loops, or broadcasting? Performance question , memory , loops	3	897	March 27, 2020
One-dimensional Array access New to Julia	9	540	May 29, 2020

Access arrays in memory order, along columns

Related topics