How to best iterate over k-dimensional collection, e.g. (n,k) matrix?

I have a dataset with n rows and k columns I want to iterate over. What is the recommended way?

Currently I would first create a n x k Matrix and then do two nested for-loops. I heard column wise is best

for col in 1:k
   for row in 1:n
      ...
   end
end

I heard so that functions like eachindex() and objects like CartesianIndex exist and I’d love to understand how to use them better. Eachindex loses the row and col information. CartesianIndex I was not able to properly use.

(1) Which collection to choose. Is Matrix okay?
(2) Are there more simple ways to iterate over a collection while maintaining (a) the row and col indices, (b) the matrix index, (c) both
(3) What are Cartesian Indices and how/when to use them?

These are the most idiomatic ways, I think:

julia> a = [ 1 2 3; 4 5 6 ]
2×3 Matrix{Int64}:
 1  2  3
 4  5  6

julia> for i in axes(a,2), j in axes(a,1)
           @show j, i, a[j,i]
       end
(j, i, a[j, i]) = (1, 1, 1)
(j, i, a[j, i]) = (2, 1, 4)
(j, i, a[j, i]) = (1, 2, 2)
(j, i, a[j, i]) = (2, 2, 5)
(j, i, a[j, i]) = (1, 3, 3)
(j, i, a[j, i]) = (2, 3, 6)

julia> for c in CartesianIndices(a)
           @show c[1], c[2], a[c]
       end
(c[1], c[2], a[c]) = (1, 1, 1)
(c[1], c[2], a[c]) = (2, 1, 4)
(c[1], c[2], a[c]) = (1, 2, 2)
(c[1], c[2], a[c]) = (2, 2, 5)
(c[1], c[2], a[c]) = (1, 3, 3)
(c[1], c[2], a[c]) = (2, 3, 6)

There is nothing wrong with your double loop, just be sure that the indexes are inbounds (something that is guaranteed by these alternatives)

ps: a matrix is definitely ok. Alternatively you may want to look at DataFrames package and companions, if you want something more sophisticated in terms of data manipulation.

7 Likes

Other possible options can be found here (with bonus benchmarking, although they are probably dated):

See also: OrdinalIndexing.jl