What's the big deal? 0 vs 1 based indexing


#1

As an engineer coming from a non CS / EE background, I have no strong preference or use for 0 based indexing. In fact, 1 based indexing seems incredibly intuitive to me when I use Julia to work with data and manipulate arrays with loops, etc. for me, this aspect of Julia is a big plus. I find lots of hate though on the web for Julia on this point. Why? Is it just a question of familiarity / code comparability, or are their underlying reasons I am missing?


#2

I found since moving away from 1-based indexing (FORTRAN) I much less often needed to worry about +1 or -1 - I don’t think it’s entirely arbitrary. I use python/numpy, and it seems almost ideal. The only defect is:
array[start:-end]
There is a anomaly when end==0. [:-1] will give all but the last element, but [:-0] == [:0] is an empty array. This has occasionally bitten me (when end is a computed variable, and just happens to end up turning into -0).


#3

There is no big deal. [Julia has both, it makes 0-based or arbitrary a little more complex, if you want to support all ways. If you know you only want 1-based, just use 1:end, eitherwise eachindex(my_array) is a little better.]

Programmers make a big dea lof this. All CPUs have a linear address space that starts at 0 [in the past, and on microcontrollers can be more complex…].

In ordinary code you never want to access location 0…, yes, you can also define start of arrays as 0.

This was mostly an issue for eary (simple) compilers. The extra implied -1 you have to do, can and will be optimized away. [Are there any exceptions to that? I think not. At least would be outside of loops, so not slow down.]


#4

1-based indexing is a big deal for same reason most other fad topics are a big deal: it’s such a simple idea that everyone can have an opinion on it, and everyone seems to think they can “help” by telling their personal experience about how this arbitrary choice has affected them at one time in their life.


#5

One interesting kernel, sometimes lost in the flamewars, is that the choice comes down to a preference for counting (1-based) versus offsets (0-based). Mathematically-focused languages understandably care most about counting, so the natural choice is 1-based (Fortran, Matlab, etc.). System languages dealing directly with memory tend to care most about offsets, so have usually been 0-based. Obviously there are counter-examples, but that’s the best explanation I’ve heard.

Also worth pointing out that many people have been working hard on abstractions for Julia that allow considerable flexibility in this area, see: http://julialang.org/blog/2016/03/arrays-iteration


#6

That’s exactly it.
1-based indexing is actual indexing like in mathematics, while 0-based “indexing” isn’t indexing at all but pointer arithmetic. This comes from C where an array is just syntactic sugar for a pointer.


#7

Instead of flaming, what would be best IMO is to simply have the flexibility to handle arbitrary bases, and also arbitrary ordering, which would help a lot interfacing to C/C++/Java/etc. that all have 0-based, row-major memory layouts.
Tim Holy has recently done some great stuff to add that flexibility, with OffsetArrays and PermutedDims, but I think it would be much nicer if that flexibility were integrated into the base Array types in Julia (as Fortran 90 and later have for arbitrary offsets).


#8

This is a simple distinction but so easily overlooked, perhaps due to the terms used for these notions both in programming and in general language. I recently came across a post that explores this issue from a beginner’s point of view (i.e. someone who hasn’t considered this semantic distinction in mathematical terms, which was certainly my case). I think it explains the issue quite well: https://betterexplained.com/articles/learning-how-to-count-avoiding-the-fencepost-problem/. Wikipedia also describes the “fencepost problem” in https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error.