When you have operations with more than one array, you won’t necessarily be able to access elements of all of them in order. As an obvious example:
A = rand(64,64);
A .+ A'
What is the best order here?
The best order there is probably something like 4x4 or 8x8 blocks of both arrays, which corresponds to the memory layout of neither array.
For any given array, there is no one how it ought to be iterated. How we ought to iterate across it depends on what we’re doing with it, and what we’re doing with other arrays at the same time.
This is why ArrayInterface doesn’t currently tell people what to do with the arrays. It just describes their internal memory layout in a way that LoopVectorization.jl
can use (and will in the next major release) to pick what it actually ought to do in that particular context.
The behavior of the particular code is also what will determine if the iteration order matters.
But if someone writes eachindex(A)
, presumably they don’t care much about the order; the docs promise to [c]reate an iterable object for visiting each index of an AbstractArray A in an efficient manner.
But on specifying that an Array’s memory is stored in blocks…perhaps I should add that. Currently, it only has support for saying that it is stored in vectors:
julia> A = rand(4,4);
julia> ArrayInterface.contiguous_axis(A)
ArrayInterface.Contiguous{1}()
If this returns a dimension other than the one for which StrideRank
is 1
, then that means there are contiguous vectors embedded there, of length ArrayInterface.contiguous_batch_size(A)
.
For example, if we have, hypothetically, an array A
such that
julia> ArrayInterface.stride_rank(A)
ArrayInterface.StrideRank{(1, 2)}()
julia> ArrayInterface.contiguous_axis(A)
ArrayInterface.Contiguous{2}()
julia> ArrayInterface.contiguous_batch_size(A)
ArrayInterface.ContiguousBatch{8}()
Then if A
were a 2 x 16
matrix, the order of the indices in memory would be:
A[1,1]
A[1,2]
A[1,3]
A[1,4]
A[1,5]
A[1,6]
A[1,7]
A[1,8]
A[2,1]
A[2,2]
A[2,3]
A[2,4]
A[2,5]
A[2,6]
A[2,7]
A[2,8]
A[1,9]
A[1,10]
A[1,11]
A[1,12]
A[1,13]
A[1,14]
A[1,15]
A[1,16]
A[2,9]
A[2,10]
A[2,11]
A[2,12]
A[2,13]
A[2,14]
A[2,15]
A[2,16]
That is, we have sets of 8
elements from the second axis mixed in between.
Perhaps this should be generalized to blocks.