OffsetArrays methods for querying dimensions?

From some light reading, I understand that OffsetArrays ahs been a source of a lot of errors, some stemming from use of conventional Array methods like length and size.

Why is it not possible to write a package that overloads these methods to behave “intuitively” so that you can write code that is agnostic with respect to the array indices? If I understand correctly, there is not a problem when Fortran allows user-defined array indices so it must be possible.

I’ve heard of methods hitting integer overflows with extremely offset indices, but never about errors with length and size. It’s not clear what errors you’re referring to or how you’d overload those methods to fix them.

The correct way to write code for generic arrays is to

  1. forget 1:length(a) and 1:size(a, k)
  2. replace them with eachindex(a) and axes(a, k)

But this list is non exhaustive: there are more things that you need to watch out for (I’m not even sure where to find a list, perhaps someone can help). That’s why many people don’t bother for their own projects. The end result is that there is plenty of code out there which

  • accepts any AbstractArray in theory (for instance because the developer wants it to work on dense and sparse arrays)
  • fails in practice when given an OffsetArray

I think this is a decent price to pay for Julia’s composability, but some people disagree.

3 Likes

I also don’t have first-hand experience but it seems like it is/was a common problem, like said here and in the docs/tutorials.

I guess my question is why cannot length and size be overloaded with eachindex and axes when OffsetArrays are used so they behave with minimal code differences with using standard Arrays?

Because that’s not what length and size are for. If I apply length to a vector with indices 1:5 versus another vector with indices 0:4, I expect the same answer 5. It’s fundamentally infeasible to index both vectors with 1:length(a), I’d be missing index 0 for the latter vector, no matter what length does.

There are algorithms that don’t fundamentally need Base.require_one_based_indexing and can be patched to accommodate offset indices, but that requires work and testing, and for a feature that isn’t as demanded as other things right now.

3 Likes

For OffsetArrays, length and size are correct: they return the total length and the size without offset. The only issue is that you should not use them to iterate

Yeah that makes sense. I guess it is idiomatic MATLAB to iterate over 1:length(a). In R, you would iterate over seq_along(a) and so it makes sense that the indices could be retrieved directly rather than generating them. I suppose that’s what eachindex(a) does.

Nowadays it is idiomatic Julia to iterate with eachindex, and the VSCode extension will even scream at you if you don’t. But that doesn’t mean everyone does it, and there’s tons of older code still running

2 Likes

That is good to hear. On the one hand, idiomatic Julia now makes it easier to use OffsetArrays, but you mention there are more things to watch out for:

So using eachindex and axes doesn’t guarantee that introducing an OffsetArray will necessarily work in a function that you’ve written for a standard Array.

Big exception is unusual indices selections, like “start from 1/3 into the vector, end at 2/3 into the vector, aim for 10 elements”. There isn’t really a guide for using generic index methods for that so I see people stick to 1-based Arrays and compute the indices from 1 and length. I’m sure there is some math for best practices but again, not much in demand.

Simple example is any function that has a matrix multiplication step. It’s feasible to match offset axes in matrices and do the usual multiplication, it’s just not implemented yet.

1 Like

So using eachindex and axes doesn’t guarantee that introducing an OffsetArray will necessarily work in a function that you’ve written for a standard Array.

There is a guide:

https://docs.julialang.org/en/v1/devdocs/offset-arrays/

I think if you follow that advice you’re pretty much set. And indeed using eachindex and axes is all you need the majority of the time.

However, it was written before we could address the question:

Big exception is unusual indices selections, like “start from 1/3 into the vector, end at 2/3 into the vector, aim for 10 elements”

I’m not quite sure what you mean by “aim for 10 elements” or which should “win” if the start and end points are in conflict with having 10 elements, but here’s one guess at what you wanted:

ndiv3 = length(a) ÷ 3
a[begin + ndiv3 : min(end - ndiv3, 10)]

Aside from using begin rather than 1, this is essentially identical to what this might look like if they were plain arrays.

Bottom line, we have all the tools to easily write generic-indexing code. It took a while for the word to get out to packages, but now that we have linting we’re in pretty good shape. Yuri’s post has been widely read but these things do get outdated. It’s now a bit like complaining that a lot of Python packages are still stuck between Python 2 and Python 3; it was a problem once, but has long since been fixed.

2 Likes