Simpler example that intends 1-based indices of 2 rows and 3 columns. These are “consistent”:
julia> for c in 1:3
for r in 1:2 # iterate rows directly in column-major
println((r, c))
end
end
(1, 1)
(2, 1)
(1, 2)
(2, 2)
(1, 3)
(2, 3)
julia> for c in 1:3, r in 1:2
println((r, c))
end
(1, 1)
(2, 1)
(1, 2)
(2, 2)
(1, 3)
(2, 3)
julia> [(r, c) for c in 1:3 for r in 1:2]
6-element Vector{Tuple{Int64, Int64}}:
(1, 1)
(2, 1)
(1, 2)
(2, 2)
(1, 3)
(2, 3)
but this is not. It looks row-major, but we’re actually iterating c directly instead:
julia> [(r, c) for c in 1:3, r in 1:2]
3×2 Matrix{Tuple{Int64, Int64}}:
(1, 1) (2, 1)
(1, 2) (2, 2)
(1, 3) (2, 3)
The reason is actually pretty intuitive: the array’s dimensions 3x2 match the order of the iterables’ lengths.
If we want to emulate the first 3 examples, we have to reverse the order so the implied dimensions 2x3 also match:
julia> [(r, c) for r in 1:2, c in 1:3]
2×3 Matrix{Tuple{Int64, Int64}}:
(1, 1) (1, 2) (1, 3)
(2, 1) (2, 2) (2, 3)
Let’s say the order of the dimensions is left to right as written, and that rows must be to the left of columns for matrices. Column-major array dimensions expand from left to right (rows, columns, beyond) in the dimensions’ order, so the most efficient for-loop must iterate right to left (innermost loop over rows) in the reverse order. That’s the root cause of the for c in 1:3, r in 1:2 “inconsistency” between for-loops and comprehensions, and any reasonable switch would just move that “inconsistency” somewhere else.
On the other hand, row-major array dimensions expand from right to left (columns, rows, beyond) in the reverse order, so the for-loop would iterate left to right (innermost loop over columns) in the dimensions’ order. The reverse order isn’t written as often e.g. aligning dimensions for broadcasting, so this is one of the aesthetic arguments in favor of row-major.
Something to consider for a row-major array package or a different base language, I doubt even Julia v2 would make this big a change to Array instead of disambiguating for r in 1:2, c in 1:3 between for-loops and comprehensions. There’s just too much tradition in column-major matrices e.g. BLAS buffers, though it’s worth mentioning that transpose flags handle row-major matrices (I believe at zero cost but I’m not sure).