OffsetArrays, inbounds, and confusion

JohnnyChen94 · May 19, 2022, 3:13pm

It seems to me that a big confusion among the community is that people are worried that every function should support custom axes and it will take a lot of efforts to correct the implementations. No, there are many valid reasons to not support offset arrays.

If the function does not plan to support offset arrays, add Base.require_one_based_indexing as an eager check is sufficient enough to tell your package users they should use “standard” arrays.

You don’t need to add “this-package-support-offset-arrays” badges, you don’t need heavy documentation, just do more checks, and when you do support generic arrays, write more tests. Eventually, it is the test that gaurentees correctness in the long run.

lmiq · May 19, 2022, 3:31pm

I would do

mbauman · May 19, 2022, 3:37pm

The idiom I like here is:

for (count, idx) in enumerate(eachindex(A))
   A[idx] = count^2
end

lmiq · May 19, 2022, 3:43pm

I don’t feel confortable to use this kind of construct, I’m afraid it may have performance implications. Even if it doesn’t, it takes quite a lot of practice and/or understanding of the internals to see that such a thing does not allocate anything, and is lowered to something fast.

(And kudos to Julia for not forcing me to use that, one of the reasons numpy never was pleasant to me)

simsurace · May 19, 2022, 3:51pm

Isn’t it a concern that require_one_based_indexing isn’t exported? My understanding was that unexported functions are not part of the public API.

simsurace · May 19, 2022, 3:53pm

That’s the core issue here. I fear that a lot of people just assume that Julia is 1-based (which is a fairly good approximation) and work in that mindset when writing code.

mcabbott · May 19, 2022, 4:12pm

Don’t know exactly what the policy is. has_offset_axes has a docstring. Both are mentioned here: Arrays with custom indices · The Julia Language

Might be neat to have a macro which could apply this e.g. to a whole module, instead of sprinkling such functions everywhere.

StefanKarpinski · May 19, 2022, 4:14pm

It occurs to me that one way someone might conclude that one-based indexing is at fault here is if they believed that OffsetArrays exists primarily to let people have zero-based arrays and so if Julia had zero-based arrays in the first place, then OffsetArrays wouldn’t need to exist. Of course, just changing the array offset from one to zero is not why OffsetArrays exists; there are much more compelling use cases and the package would exist regardless of what base Julia had chosen for Array to be indexed from.

mbauman · May 19, 2022, 4:24pm

Not too tough:

julia> function f!(A)
           for (count, idx) in enumerate(eachindex(A))
               A[idx] = count*2
           end
       end
f! (generic function with 1 method)

julia> f!(A); @allocated f!(A)
0

Or this works as a slightly less-quick sanity check, too:

julia> using BenchmarkTools

julia> function g!(A)
           i = 0
           for idx in eachindex(A)
               i += 1
               A[idx] = i*2
           end
       end

julia> @btime f!($A)
  11.029 ns (0 allocations: 0 bytes)

julia> @btime g!($A)
  13.349 ns (0 allocations: 0 bytes)

Amusingly, that difference does seem real — your “comfortable” construction is actually slower

jakobnissen · May 19, 2022, 4:37pm

Lots of people read Yuri’s blog post and their takeaway point was seemingly that OffsetArrays are bug prone. I think that is missing the point and over-emphasizing a single example that was supposed to illustrate a broader trend.

Of course addressing the OffsetArrays issue e.g. in ways recommended by Kristoffer Carlsson above would be a good and productive change - preventing individual cases of bugs is always a good thing - but it does not alter the overall point of the blog post, and it does not even address the specific issue of generic array indexing.

For example, CUDA arrays does not support indexing of individual elements at all, and StaticArrays does not support mutation. Hence, even a “correct” pattern like:

@inbounds for i in eachindex(A)
    # assign something to A[i]
end

Will fail for some array types. And then, we will just see a future blog post about how Julia has correctness issues because it suddenly crashed at runtime when a function was unexpectedly passed a StaticArray or whatever. The issue is still the same thing: Julia provides few ways of providing an interface and statically checking it. How to do this effectively for arbitrary assumptions like 1-based indexing or mutability is another, very difficult, issue. It would be nice if some of the collective brainpower the Julia community spends in the current moment was directed that way.

For that reason, the suggestions to remove @inbounds or OffsetArrays is missing the point. It won’t do much overall good, but will just cause headaches for those who rely on those features. Again, I completely agree that improving the specific case of unsafe @inbounds for OffsetArrays is a real, tangible improvement, and having MethodErrors thrown is better than unsafe array access. But I believe the rush to “address” OffsetArrays we’ve seen in the wake of the blog post are misguided.

mohamed82008 · May 19, 2022, 4:48pm

I think there is a bigger problem here of generic programming and the “contract” between the type author, the function author and the user. If I write a function claiming that it targets all AbstractArrays because all the arrays I have a use case for have 1-based indexing, are CPU-based and they live on a single process, am I doing something wrong? If somebody then comes along and implements OffsetArray, CuArray, or DArray and wants to use my function, whose responsibility is it to ensure my function works with these fancy new array types I never thought about when writing my function? Would the type author be breaking the contract terms of the function by calling it on something it was never intended to support? Am I a bad programmer for not considering all sorts of array types out there in the wild? Doesn’t multiple dispatch empower the type author to write their own method for their new type if mine doesn’t pass their tests?

Then what if as the type author, all I care about are packages A, B and C. But then a user comes along wanting to use my fancy type with package D, is it the type author’s responsibility to implement support for package D, or is it package D’s author’s responsibility to support the new type? Or should the user run exhaustive tests first to see what works and what breaks? This is a general problem that goes beyond OffsetArrays and @inbounds imo. Even interfaces are probably not going to fix this issue because it’s hard to encode all the assumptions I made when implementing my function, e.g. CPU-based, on a single process, is not made of repeated sub-arrays that reference the same memory so I can mutate a[1] without changing a[5], etc. Perhaps better documentation and more caution on the user’s side is the best we can do here. Doing stuff generically is just hard, we may need to accept that there are practical and governance constraints on how generic we can be.

mbauman · May 19, 2022, 4:51pm

Both can be true, and our community is large. There are concrete actionable steps that are good to take. There are also tougher problems that will take longer to figure out.

There are other initiatives underway that are already addressing the other things. E.g., the new SciMLStyle explicitly talks about some simple idioms and guidance that can help out in more cases than just OffsetArrays.

AMJ · May 19, 2022, 5:06pm

Is the enumerate() the fastest ?

Thanks to Yuri, I’ve learned an unprecedented amount of stuff in these three days. Now I have to change lots of 1:length() in my code base.

mohamed82008 · May 19, 2022, 5:07pm

I think the problem here might be that we are expecting too much from each other and from Julia as a language. Julia took generic programming to a whole new level of use and popularity and it just works a remarkable number of times but we can’t get too used to that. Because things will just work until a type comes along and breaks an assumption of a function or vice versa and then we will either get exceptions or correctness “bugs”. Perhaps bringing down our expectations a notch or 2 and just encouraging more testing by users might be the solution here. Basically if a function-type pair is not tested anywhere, proceed with caution.

Alec_Loudenback · May 19, 2022, 5:22pm

In addition to the nice examples in the announcement blogpost I wanted to point out a place where I’ve really appreciated offset arrays: In the JuliaActuary package MortalityTables.jl.

There are data tables which are indexed by a person’s age in practice (example) and it doesn’t always start at 0 or 1. Many times you aren’t indexing the array from start to finish either, so I often run don’t want eachindex.

So shout out to Julia to making this more natural/performant than my attempts in other languages!

simsurace · May 19, 2022, 9:23pm

It strikes me that I’ve always thought of type annotations in functions as something preventing headaches for the potential user. That is, if I define a function and say that it takes an AbstractArray, I’m thinking “this input should be an AbstractArray”, not "I’m guaranteeing that it will work for all AbstractArray subtypes that anyone will ever invent. That is, I’m relying on functionality that the type provides, but I’m not taking into account all other functionalities that would distinguish those other subtypes from the ones I’m usually working with, because I can’t.

But maybe that’s the wrong mindset. After all, if I’m annotating a function thinking that this is to prevent users from expecting it to call it on things which are not even AbstractArrays, it is only throwing a different error than it would if I hadn’t used any annotation at all. If I’m indeed relying on things that are only possible for AbstractArrays, it will throw an error somewhere deeper in the function. Conversely, if I had annotated to say that I’m expecting this to work for any AbstractArray subtype, no annotation would mean that I would expect this to work for all types, which almost no function will in practice (except trivial ones like typeof).

So I wonder whether it is really good to expect a type annotation to be a contract that any subtype should be supported, when we have no control over its behavior. For example, someone could define a DontTouchMeArray that sets all its elements to zero as soon as you try to modify any of its elements. It is almost guaranteed to break any function with an !, but I don’t believe it violates any of the things written down in Interfaces · The Julia Language, or at least it’s not obvious to me if it does.

Benny · May 19, 2022, 9:49pm

I think you’re right on the money that worrying about your code working with any other specific library is too much. Ideally, your generic code uses a well-documented interface that other people try to extend, and uncomposability is indicated by MethodErrors from incompatible interface methods e.g. setindex! for immutable arrays. But nothing’s ideal, the generic code may make unintended and undocumented assumptions; types can throw an inscrutable error at some downstream method that’s not even in the package; or anything else that’ll end up on a Github issue or discourse post.

You’re right about better documentation; if anything goes wrong, I want to quickly check if I violated any assumptions and limitations in a readily accessible place, like a docstring or a FAQ. But that may not be the only thing you can do.

This is actually one of the issues cited in the “Why I no longer recommend Julia” blogpost that started this recent discussion of OffsetArrays: some Base methods didn’t check for memory aliasing or document a warning against it. To avoid this, you could make input-validating methods and call them first in your methods, or if the checks have too much runtime cost and don’t need to be ran repeatedly, you can document the usage of such methods in very visible code examples.

In this case I would avoid making generic methods for broad types like AbstractArray. In their stead, I might make an alias to concisely write a Union of all the types I intend to support and test. Sometimes that might be too restrictive, and maybe clear documentation of limitations is the best you can do. A user might gripe that your code doesn’t compose with a type they want, but you don’t owe them that if you have made clear your code’s intentions. They can use multiple dispatch to implement what they need in their own package, as you’ve said, or improve your code with your cooperation. OffsetArrays is a bit of a counterexample because the AbstractArray interface does nominally support non-1-based indexing, and it’s such a thin wrapper over other array types that it’s very plausible to support.

Perrin_Meyer · May 19, 2022, 10:00pm

Im confused. thought the idiom was
for (i,x) in enumerate(X)
versus
for (i,x) in enumerate(eachindex(X))
?

A = rand(5)
julia> for (i,a) in enumerate(A)
       println(i," ",a)
       end
1 0.16336397100368438
2 0.4301384028138592
3 0.491507690411888
4 0.6551251460857042
5 0.36388032124662595

versus

julia> for (i,a) in enumerate(eachindex(A))
       println(i," ",a)
       end
1 1
2 2
3 3
4 4
5 5

simsurace · May 19, 2022, 10:22pm

We were talking about how to reference a counter that starts at one if the indexing base for the array is something else. So the output could be something like

julia> for (counter, index) in enumerate(eachindex(A))
       println(counter, " ", index)
       end
1 0
2 1
3 2
4 3
5 4

blackeneth · May 20, 2022, 1:42am

No one has mentioned the OffsetArrays provides a view without offsets

OffsetArrays.no_offset_view()

Consider the following:

Setup Arrays:

c = ones(5,5).*2

using OffsetArrays
A = OffsetArray(zeros(5, 5), OffsetArrays.Origin(-2, -2))

for (i, j) in zip(axes(A)...)

    println(i,"\t",j)

    A[i, j] = (i-j)^2+abs(i)+abs(j)+1

end

julia> c
5×5 Matrix{Float64}:
 2.0  2.0  2.0  2.0  2.0
 2.0  2.0  2.0  2.0  2.0
 2.0  2.0  2.0  2.0  2.0
 2.0  2.0  2.0  2.0  2.0
 2.0  2.0  2.0  2.0  2.0

julia> A
5×5 OffsetArray(::Matrix{Float64}, -2:2, -2:2) with eltype Float64 with indices -2:2×-2:2:
 5.0  0.0  0.0  0.0  22.0
 0.0  3.0  0.0  0.0   0.0
 0.0  0.0  1.0  0.0   0.0
 0.0  0.0  0.0  3.0   0.0
 0.0  0.0  0.0  0.0   5.0

Now use them:

julia> tr(c)
10.0

julia> tr(A)
ERROR: ArgumentError: offset arrays are not supported but got an array with index other than 1
Stacktrace:
 [1] require_one_based_indexing

julia> tr(OffsetArrays.no_offset_view(A))
17.0