Why column major?

e3c6 · May 19, 2019, 10:49am

Why Julia arrays are stored in column-major order?

Column-major vs row-major are not entirely symmetric regarding code style, because when we access the i,j’th element of an array, we write A[i,j] with the row index first, which can break the symmetry.

For example, I find it surprising that in:

[(println(i, " ", j); (i,j)) for i=1:5, j=1:5]

i is traversed faster than j, even though it looks like j is an inner loop.
This is completely different from a true nested for loop:

for i=1:5, j=1:5 
  println(i, " ", j)
end

Here j is indeed varying faster. If Arrays were row-major, these two code snippets would vary j faster, which looks more consistent to me given that in both cases the iteration is specified by for i=1:5, j=1:5.

Another example is:

[(println(i, " ", j); (i,j)) for i=1:5 for j=1:5]

Here also j varies faster, which makes [... for i=1:5, j=1:5] different from for i=1:5 for j=1:5. The difference is not just that the later is linearized, but also the order in which elements are traversed. Again, there would be no difference in order if arrays were row major.

I’m just wondering about the reasoning behind the column-major choice, and if there is a more “natural” way to think about this that I am missing.

Probably this has been discussed before on Github, though I can’t find it now.

baggepinnen · May 19, 2019, 11:52am

Row major is a really strange choice if you think about what happens when you add dimensions… How would you iterate A[i,j,k,...]? In row major languages, the fastest way is typically to iterate A (from inner loop to outer) j, i, k,... which is just horrible. In julia, you always iterate the dimensions in order A[i,j,k,...].

The order in the loop comprehension might be a bit confusing though but it still makes sense. The first one produces a vector and has the last loop as the tightest. The second one produces a matrix and has the first dimension as the tightest.

yha · May 19, 2019, 12:10pm

Row-major layout does not require any odd iteration order. In numpy, A[i,j,k,l] is most efficiently iterated in the the order i,j,k,l from outer to inner loop (and each A[i,j], which is the same as A[i,j,:,:], is a row-major matrix). Ditto for the typical way to emulate multi-dimensional arrays in C.
The thing I miss most about numpy after moving to Julia is its indexing behavior where A[i]==A[i,...], A[i,j] == A[i,j,...] etc, and A[i,j]==A[i][j]. Seems much more natural and useful to me than Julia’s punning of linear indexing with multi-dimensional indexing, which I only found to get in the way of finding bugs.

stevengj · May 19, 2019, 1:17pm

As I understand it, it was mainly historical precedent, driven by the fact that most linear-algebra libraries (LAPACK etc.) are column-major (because Fortran is column-major, and hence Matlab is column-major), though it would have been possible to work around this with judicious use of transposition.

It’s been discussed before, e.g. https://groups.google.com/d/msg/julia-users/GkvDunqE0ww/2gIDmA5cAAAJ

mbauman · May 19, 2019, 9:21pm

It also aligns quite nicely with how we (and the typical mathematical notations) treat 1-dimensional vectors: as columns.

Tamas_Papp · May 20, 2019, 6:32am

Row vs column major is one of those topics where good arguments can be made for either choice, but at the end of the day it is best to just pick something and proceed. Given that BLAS/LAPACK mostly use colum-major, I think it is a reasonable choice to follow.

For people who really like row-major, it should be feasible to implement it using a custom type without a performance penalty — with a lot of work, of course.

I did this when implementing a Common Lisp (which is row-major) wrapper for BLAS/LAPACK. It works in about 90% of the cases (not all primitives handle transposes), and was always a reliable source of bugs.

RGon · June 9, 2021, 3:05pm

I think mathematical notation means we’re kind of stuck with column-major if we want internal consistency in higher dimensions. I’ve never been able to think of a coherent way to make things row-major. The real problem is that if a matrix A[i,j] means the ith element of the jth row, then it’s most natural to slice things from front to back (like A[:,1]) or for a 3D matrix (A[:,:,3]). Otherwise, to maintain order consistency like baggepinn said, we’d have to slice from back to front meaning that A[1,:,:] would be more efficient than A[:,:,1] which just seems weird. I think this is why Fortran is and LPACK went column-major.

The only real reason I can see why we would want row major is because most text files are actually row-major. However, we don’t use text files in multiple dimensions. So I guess it’s always a tradeoff between how we do text and how we do math, which is why math-heavy languages are column major and text-heavy languages are row-major.

GunnarFarneback · June 9, 2021, 3:21pm

If you’re working with images you’ll notice that virtually all image formats store the data row-major as well.

DNF · June 9, 2021, 5:04pm

Ideally linear algebra itself should be row major, that is, the first index would move along rows like text in western countries. Then choosing row major in programming languages would be a no-brainer.

As long as the first index moves down along columns in linear algebra, there is no good choice, only tradeoffs.

RGon · June 9, 2021, 7:06pm

Exactly. Linear algebra and text chose different conventions, and we’re paying the price for that. Either we change how linear algebra notation works or we start reading column-major like Traditional Chinese (except with columns left to right). I didn’t know images were row-major, but I guess it makes sense if they needed to be used as input for dot-matrix printers.

neophytedave · June 9, 2021, 7:12pm

This discussion has caused me to reply. I am an ex-programmer from long ago who was an APL enthusiast. APL’s successor is J. The current discussion brings to mind J’s database, Jd. See the information on Jd being column versus row-oriented. Jd/Overview - J Wiki. Fascinating.

Henrique_Becker · June 9, 2021, 7:37pm

This is a good point. Dataframes and Databases will always lean to the side of being column-oriented because of inverted indexes and “same type for the whole column”.

GunnarFarneback · June 9, 2021, 8:34pm

For my part I’m extremely grateful that this is an area of mathematics where everybody agrees about the convention. The programming languages disagreement about column major and row major is a triviality compared to the nightmare linear algebra had been with competing conventions.

jzr · June 9, 2021, 8:39pm

That page says

Most RDBMS systems are row oriented.

Henrique_Becker · June 9, 2021, 8:47pm

Let me rephrase it to “There will always be valid reasons for Dataframes and Databases to be column-oriented.”, I am not sure how old the text is but, “For example in 2014 Citusdata introduced column-oriented tables for PostgreSQL”[1].

DNF · June 9, 2021, 8:54pm

It’s good there’s agreement, but I wonder why it’s column major, which is also incompatible with the xy-convention, which has caused me a lot of grief.

As far as I can tell it was a ~~bad~~ unfortunate choice, without which the programming language disagreement may not have existed.

yha · June 9, 2021, 9:23pm

I don’t think that’s what “row-major” generally means. It refers to the layout in memory: each row is stored continuously and columns are not continuous. This is completely orthogonal to the question of which index in A[i,j] is interpreted as a row and which as a column. For example the adjoint matrix type in Julia (such as rand(m,n)') is row major, but A[i,j] means ith row, jth column as usual.

we’d have to slice from back to front meaning that A[1,:,:] would be more efficient than A[:,:,1] which just seems weird.

I’m curious why you find this weird. I find it more natural that the first index (which by convention is rows in the case of a 2d array) is the slowest varying, since this way the usual lexicographic order of indexes follows their memory order.

DNF · June 9, 2021, 9:57pm

It is really weird to me that the last index is the fastest. It’s as if it should be quicker to skip from page to page in a book than along a line. And even faster to skip from book to book.

yha · June 9, 2021, 10:10pm

I think that when referring to a location in a book, I would specify the page number first, then the line number, then the location inside the line (“page 54, line 3, 5th word”). So the last index is the fastest.
Other arguments for MSB-first order: It’s the same convention as written numbers. It’s consistent with indexing into array of arrays (a[i][j] and a[i][j+1] are adjacent in memory. In numpy this is the same as a[i,j] and a[i,j+1]). I can’t think of any argument in favor of the LSB-first order Julia uses.

RGon · June 9, 2021, 10:15pm

Exactly, so if you’re doing row-major consistently, it also means that the order you’re filling array by changes with respect to the dimension.

A consistent row-major would be
A_1d => every row
A_2d => every column, for every row
A_3d => every depth, for every column, for every row

In a consistent column-major scheme you would fill
A_1d => every row
A_2d => every row, for every column
A_3d => every row, for every column, for every depth

So a consistent column-major indexing scheme will fill the nd array the same way every time, but a consistent row-major indexing scheme fills it differently for every array dimension. I might add that the other problem is that doing it this way means that you wouldn’t be able to do linear algebra very efficiently over slices along depths (which is how we usually see things). For higher dimensions than 2, you would have to permute dimensions every time you slice efficiently if you’re going to use linear algebra.

Topic		Replies	Views
Row and column major order for arrays of different shape Performance column-major , row-major	23	2712	April 12, 2022
The columns and the rows - puzzled about performance Performance question	8	905	October 14, 2021
Performance difference when accessing square matrix rows-first or cols-first Performance	14	1730	April 13, 2021
What are the pros and cons of row/column major ordering? Internals & Design arrays , column-major , row-major	10	3179	February 13, 2024
`for i = 1:M, j = 1:N` discrepancy (loop vs comprehension/generator) Internals & Design syntax , loops	26	4021	December 28, 2022

Why column major?

Related topics