Numpy's list of lists vs. Julia's linear indexing

I think Numpy is truly committed to this, and that is a good thing. x[0][0] actually is a valid index into a matrix, and in general, a[i1][i2]...[ik] == a[i1,i2,...,ik]. Much more reasonable and useful than julia’s a[i]==vec(a)[i] in my opinion. (y[0,0] doesn’t work in the example above, but that constructor is deprecated anyway).

Linear indexing is by far the most efficient way to scan through an array. That may not be of much use when your for loops are slow in the first place, but in Julia the for loops are fast enough that the difference between linear and cartesian indexing becomes a major issue. Of course, we could also introduce some kind of wrapper (or unwrapping) that allows linear indexing, but there’s a long tradition of linear indexing into arrays from Fortran and Matlab.

8 Likes

I am not sure if numpy is a role model for anything (it was not designed by scientists or experts in the field).

I never learned C properly (but man I wish I had better C knowledge) but I remember they always give the following examples:

  1. Implement an n-dimensional array as a flat array consecutive in memory.
  2. An n-dimensional array constructed via arrays of pointers (e.g. the rows are represented by a list of pointers and each pointer points to the start of the values of a row).

I always thought the first method must be faster and safer but I am no expert.

But why would you have a linear index into an n-dimensional array in the first place? I don’t recall ever finding myself in that situation, and I have never done a[i::Number] on an high-dimensional array in Julia except by accident.
The only reason I can think of to do that, is in the low-level implementation of either multi-dimensional indexing, or of iteration through all elements of the array. User code can just iterate using a for loop anyway. Or if iteration should be consistent with one-dimensional indexing (not sure how important this is), do for x in vec(array), which I think is better style anyway.

It is faster of course, but this choice of memory layout is independent of what a[i] does. Numpy also stores arrays consecutively in memory, but still clearly distinguishes array from vec(array), and array[i] returns an n-1-dimensional slice.

Yes, Julia and numpy have different semantics here. And yes, folks may have reasons to prefer one over the other. Whether one is more reasonable or useful than the other is indeed quite a matter of personal opinion.

Personally, I don’t think either semantic (list of lists nor linear indexing) is all that great. But, sure, both are useful in different situations. Heck, linear indexing is what you often end up doing with Julia’s eachindex (not that it necessarily needs to be that way, but it is that way right now, and it’s not going to change for quite some time).

(Moderator’s note: I know this came up earlier in the original thread, but this makes for a decent split point.)

5 Likes

Is that vec?

NumPy does this on a lower level, it just doesn’t expose this to the user like Julia does. Every time you do a[i1,i2,...,ik] where all i are integers, (i1,i2,...,ik) is used to calculate a linear integer index that is then used to get to the element’s address in the linear virtual memory. Julia exposing linear indexing lets users skip this step for a multidimensional array without having to make a view-like Vector like b = vec(a). The less views or view-like objects to work with, the less that can go wrong; for example, if you accidentally push! to b, b unshares its underlying buffer with a.

You’re also leaving out something important about NumPy. a[i1][i2]...[ik] == a[i1,i2,...,ik], but the former is quickly discouraged in tutorials because it makes many temporary arrays (for Julians, it’s like making heap-allocated views). a[i1] is just syntactic sugar for a[i1, :, :, ..., :] in NumPy.

4 Likes

Maybe, but ideally something that doesn’t involved reshape, which can be a bit sketchy.

You could always have a LinearIndex type, analogous with CartesianIndex.