OffsetArrays, inbounds, and confusion

There has been quite a bit of activity around the Yuri Vishnevsky article. Thought I would but some of my thoughts here as someone who is fairly new to Julia and heavily uses C++ and python professionally.

As others have noted, many of his complaints revolve around OffsetArrays and bugs they cause. I think the bottom line is that Julia is a 1-indexed language. To be clear, I would much prefer it to have been a 0-indexed language and had I been the one to make the decision I wouldn’t have even considered 1-indexing, but it is what it is and I have no interest in trying to fight an uphill battle to make julia conform to my preference. OffsetArrays are a nice idea, but as we can see they cause headaches for library maintainers. In a professional setting I don’t like them because if i collaborate with someone who uses them all of a sudden it becomes much harder to have compatible code and adds extra overhead just to read their code.

I saw an issue on github about @inbounds, deprecate `@inbounds` and change it to `@unsafe_inbounds` · Issue #45342 · JuliaLang/julia · GitHub, again related to OffsetArrays. As someone coming from C++/python it is perfectly clear the implications of @inbounds. It’s unsafe and you accept the responsibility for the consequences, including using OffsetArrays.

Seems like the easiest path forward would be to deprecate OffsetArrays and discourage their use. If that is upsetting to a majority of people maybe the whole language should switch to 0-indexing as the default.

Anyway, sorry for the rant. Just thought I would write down some of my thoughts. Julia is fantastic. Thank you all for your hard work!

3 Likes

Continuing that discussion in the other thread, I initially thought too that the easiest way forward would be to deprecate OffsetArrays and just stick to a standard (I’m a bit old fashioned probably), but the language seems to be headed in the opposite direction: maximize the use of generic code.

From what I understand, any code that doesn’t require the explicit values of the indices (and I can think of a lot of such code) can go fully generic, and rely solely on eachindex(A), axes(A,1), begin, end, firstindex(A,1), and so forth.

This would require some effort in getting everyone on board, but it is perhaps worth it. In any case, I think it is worth trying to code like this when the values of the indices are irrelevant (which is quite often).

8 Likes

Don’t know if you’re the same person, but I just answered here that coming from C++ does help a lot to understand the complications and problems of invalid access to memory.

That’s not possible (PSA: Julia is not at that stage of development anymore), or desirable, I think, for a lot of people.

It is a package, so it cannot be deprecated (unless you convince the package author to take it down, but even then it would remain available on the Registry).

I don’t think it is needed to discourage the use of it, as it is somewhat niche. But as I stated on the thread discussing the original blog, I need them for the work I do. I get paid to do it that way. :sweat_smile:

8 Likes

This is frankly a quite big overreaction to a blog post that happened to get a lot of traction. Yes, the stats library had some old code that used @inbounds too frivolously which was memory unsafe when used in combination with arrays with custom indexing. But this is just one specific example and focusing on it too much means you won’t see the forest for the tree.

Since removing @inbounds or deprecating OffsetArrays just won’t happen, it is instead better to look at more productive ways where issues like this can be avoided. Some examples are:

  • Linter that detects 1:length(x) pattern.
  • Better docs for @inbounds
  • Better tools for testing packages with custom arrays.
  • Tool to audit the package ecosystem for problematic usage of @inbounds.
  • Improvements to the compiler so that less usage of @inbounds is needed.

In fact, many of the points in the list above have already happened or are in progress. But suggesting things like deprecating very useful packages because someone wrote some (valid at the time) code 8 years ago that didn’t work properly with what Julia arrays evolved to be is throwing the baby out with the bathwater.

57 Likes

The exact opposite point is made in my copy of Numerical Recipes in C, that many algorithms are 1…M rather than 0…M-1

5 Likes

The AbstractArray interface was redesigned years ago to accommodate any indices with a step size 1 seemingly arbitrary indices (OffsetArrays actually seems more strict, effectively requiring indices with steps of 1), and I think it is a good thing to keeping moving toward the generic direction even if the AbstractArray code in Base has yet to be fully updated.

True, the concrete implementation Array is 1-indexed, and it seems general enough for practical purposes because we can make any custom struct its element type. However, it would go against Julia’s generic style to mandate a particular indexing and to encourage people to write 1:size(A,1) instead of axes(A, 1). For example, when we add to a variable in a loop, we start with x = zero(T) to maintain performant type stability. Sure, you can rightfully argue that it adds extra overhead to read the code compared to x = 0, but it’s preferable to the alternatives: 1) accepting type instability and the performance overhead like in pure Python, or 2) manually writing or generating non-generic methods for each supported type x = 0/x = 0.0/x = 0.0+0.0im. If anything, the Julia documentation or tutorials could benefit from hammering in the point of generic code from the get-go; I’m sure I’m not the only one who wrote x = 0 for an embarrassingly long time before running across one of the many threads teaching people to use zero.

Someone correct me if I’m wrong, but my understanding is that code composability/compatibility depends entirely on good interfaces, a “contract” of sorts between “partners.” For example, let’s say you want to use a type from library A in generic algorithms from library B. You need: 1) to implement library B’s documented interface for library A’s type, and 2) library B’s generic code to faithfully match the documented interface’s methods and assumptions. Even without Julia’s interface-based composability, you need people to read and adhere to well-documented ground rules (e.g. everyone must use 1-based indexed arrays) to maintain any kind of compatibility, but it’s easier on library A’s developer if their ground rules were limited to an interface instead of all of library B’s code (that’s library B’s developer’s job).

1 Like

Furthermore, the generic indexing patterns aren’t just good for handling non-standard indexing. They also make it convenient to handle arrays with different dimensions. ( And I hope no one would suggest deprecating multi-dimensional arrays.)

Imo, the bottom line is that ‘deprecating’ or discouraging non-standard indexed arrays would be a horrible move. Since they can be defined in packages by anyone, one can only do it by sabotaging the work of those who created them, and of those who use them, which is just unacceptable.

The problem is with @inbounds, which has become too widespread, and is widely recommended even to newbies. I do wish it was called @unsafe_inbounds, but maybe it’s too much hassle to change.

1 Like

but the language seems to be headed in the opposite direction: maximize the use of generic code.

This is the nature of Julia because of multiple dispatch – if you try to build a monothlic package with all the safe assumptions and limitations, it’s much easier to just use C/C++/Rust or whatever.

OffsetArray isn’t something special in this story – it’s indeed a very well-written packages by who designed this generic custom axes and is a very good test stone to ensure that your code works well for generic array. This is exactly why you see OffsetArrays mentioned in the Julia documentation.

If anything, reverting the entire Julia back to the old 1-based indexing world is not going to happen because 1) people paid so much effort to build it up so nobody will really try to pay another round of efforts to support less, and 2) it is a useful concept. – You don’t use it in your field doesn’t necessarily mean it’s useless.

Speaking of generic coding, doing more eager checks is better than adding more implicit assumptions. Julia’s multiple dispatch allows you to skip and ellinimate the unnecessary checks without overhead, so why don’t try to embrace it? Add an eager check on Base.require_one_based_indexing if you worry about your package users wrongly pass non-standard 1-based arrays. This is much easier than reverting the ecosystem.

9 Likes

The cat is out of the bag, the question is what to do about it now, and I proposed badges (see my new proposal below, which is not mutually exclusive with it):

The problem is you could have been aware of the problem and made legal code with it (for all arrays), but then the assumptions broke with the OffsetArrays package (and Julia 1.4). And now all who use the macro (or even if not, just take in AbstractArray or AbstractVector) will need to think about its use (and test for it) too. I’m not sure people will, that seems to be the fatal flaw for renaming to @unsafe_inbounds, with many people only taking regular 1-based arrays into account.

You really want to support AbstractArray for other reasons, e.g. sparse, Vandemonde matrices or whatever, and most of those alternative array types are still 1-based, but different storage formats.

OffsetArray is a different kind, orthogonal to the other alternative arrays, can e.g. be combined with Vandermonde.

I’m thinking could we solve this problem by introducing a different abstract array type in the hierarchy?

Had we done from the start (not too late?) problem averted:

abstract type AbstractArbitraryArray <: Any end

abstract type AbstractArray <: AbstractArbitraryArray end  # would in this
proposal disallow non-1-based

const AbstractAArray = AbstractArbitraryArray  # maybe good to have, as short indicating OffsetArrays ok
2 Likes

36 Likes

Nice!

A related question - I tend to fall back to for i = 1:... indexing in situations where I want to write to a specific location on fhe LHS into a pre-existing results container, something like:

for i in 1:length(x)
    result[i] .= myfun(x[i])
end

What’s the suggested replacement for that? pairs?

3 Likes
for idx in eachindex(result, x)
     result[idx] .= myfun(x[idx])
end

Should work just fine. Your code already assumes that they have the same size, and eachindex will throw if they aren’t.

EDIT: although, I guess result could be larger than x? In that case, I’d rewrite this to have result be a view into result of the same size as x.

9 Likes

Thanks for all your replies. Things are more nuanced than just disallowing 1-indexing. This is why people who are smarter and better informed than me are making the decisions. Thanks for making Julia great and replying to my silly rant!

4 Likes

Several people here and elsewhere have made the same claim that one-based indexing is somehow at fault for the bad interaction between OffsetArrays and code that applies to all AbstractArrays but assumes they all start at a fixed index. This doesn’t really make sense. Let’s imagine a hypothetical world where Julia had used zero-based indexing from the beginning. The same exact history would have happened. We’d start out with a collection of concrete array types in the standard library that are indexed starting at zero. There would be a bunch of code written based on that assumption, including documentation, which people would copy. Then someone would come along and decide that it’s convenient to allow arrays to have arbitrary ranges of indices in different dimensions, and they implement OffsetArrays. At that point you’d have the exact same situation described above, except with zero in place of one. The number one isn’t in anyway significant to the story. Zero-based indexing doesn’t fix anything and one-based indexing isn’t at fault, whatever you may think of it aesthetically.

The other remedy you suggest is to deprecate OffsetArrays and discourage their use. Unlike changing the base of array indexing, this would actually address the issue. But it seems very much like throwing the baby out with the bathwater. Allowing array indices to start at arbitrary offsets is not a wild or controversial idea, nor is it unprecedented—it has a long and successful history in some of the most scientifically minded languages: Fortran, Chapel and Fortress all support it. It works quite well and is frequently useful. Somewhat ironically, given your username, one of the use cases for offset arrays is when working with Fourier transforms, where it allows indices to be symmetrical around zero. There are many examples where high-level array code gets much clearer if you can choose convenient index ranges and let the array do the offset bookkeeping for you. The Images documentation has a really nice example here.

Perl also allows changing the indexing base for arrays, but strongly discourages anyone from ever doing this. In recent versions it’s even disallowed to assign anything but zero to $[, the very Perlishly named variable that controls the base index. This is often trotted out as an object lesson in how obviously bad an idea it is to allow array indexing to start at different locations. But the problem in Perl is that $[ is a global flag that affects all arrays everywhere. So no code anywhere can assume any particular starting index, even for arrays that were constructed locally. In effect, this feature just changes the correct syntax for indexing into an array from $a[i] to $a[$[+i]. (Yes, that second expression is valid Perl syntax, unbalanced brackets and all. Extremely cursed.) This feature is not only confusing but also completely useless. Looking at this Perl misfeature and concluding that letting array indexing start at different offsets is a terrible idea is like eating some milk that you forgot in your fridge for a few months and concluding that cheese is a terrible idea.

Back to Julia: the right solution is to fix code that accepts AbstractArray but does things like 1:length(a). All you need to do is replace that range with eachindex(a). Then the code works with any starting index, including one or zero. The language already has all the features and APIs needed to write fast, generic that works with all kinds of arrays. We just need to educate and encourage people to uses them. This particular transformation can be done almost automatically. Linter checks suggesting this fix have already been implemented, to help educate people. Many languages have patterns that should be avoided and help people avoid them in this exact way—it’s basically why IDE linter suggestions were invented.

38 Likes

As per the title of this thread I’m still confused about the universal recommendation to use eachindex(a). It may be because I’ve never used OffsetArrays. But a lot of code is of the form

function myfun!(a::AbstractArray)
    for i in 1:length(a)
        a[i] = ... # some expression depending on `i`
    end
    return a
end

which will give the wrong result when using eachindex(a) and when a happens to be an array that does not use 1-based indexing, while it would throw an error in the original form.

What is the right way to write myfun when it is exported in a package? Should the function signature be less general to allow only 1-based indexing? What is the right abstract type for this? Or should the export of such function be discouraged? Should it be left as it is and only mention a big disclaimer in the docstring of myfun?

2 Likes

As a curiosity, in Fortran it is very common to pass the array length to the mutating routines, and then the indexing becomes sort of assumed (actually only the reference to the initial position in memory of the array is passed, so that indexing in the passing of the array actually does not make any difference).

For instance something like this works, and explains why the old-but-gold Fortran routines can interface with generic indexing, that came later:

program index
    integer :: i, n, x(-1:1)
    x = 0
    n = size(x)
    call sum_one(n,x)
    do i = -1, 1 ! <<<<<<<<<<<<<
        print *, i, x(i)
    end do
end program index

subroutine sum_one(n,x)
    integer :: i, n, x(*)
    do i = 1, n ! <<<<<<<<<<<<<<<
        x(i) = x(i) + i
    end do
end subroutine sum_one

will print:

% gfortran -o index index.f90 
% ./index 
          -1           1
           0           2
           1           3

This somewhat related to the question above, since the loop in case could actually depend on the counter, but not on the indexing of the original array.

2 Likes

If there is 5:length(a)-4, it should be changed to firstindex(a)+4:lastindex(a)-4 ?

1 Like

There’s no type for this. Probably inserting something like Base.has_offset_axes(A, B) && error("msg") is the best way, which Base wraps in require_one_based_indexing .

1 Like

On many cases it isn’t that simply, and the amount of training for users who are not great at programming to think with these abstractions is tough. I also think that in addition to education, an easy way to flag packages as “don’t use me with generic indexing!” isn’t the worst idea. StatsBase/etc. certainly should be general, but we should be a little careful about raising the bar too much for package writers.

A lot of code looks really ugly if written to follow completely generic indexing.

3 Likes

This does show why you can’t just completely automatically change code to use eachindex—the best fix somewhat depends on what you meant. In this case, the easiest fix might be this:

function myfun!(a::AbstractArray)
    for i in 1:length(a)
        a[begin-1+i] = # some expression depending on `i`
    end
    return a
end

But this depends on what myfun! is meant to do. Should i in the expression be an offset from the start of the array or are the array indices meaningful?

8 Likes