Is OffsetArrays.jl a poison pill?

What sort of problems do you envision?

There’s no AbstractTuple type, so there’s no code that expects 1-indexed tuples, and could be surprised by 0-based tuple input. Only if a function accepts general iterables, could an OffsetTuple be used unexpectedly, and then the code must anyway deal with OffsetArrays, so you would need general indexing. I don’t think anyone would be bothered by someone registering OffsetTuples.jl.

The issue isn’t which packages exist, but what properties can you rely on for abstract types that you want to support. Tuples do, but AbstractArray and iterables in general do not, guarantee 1-based indexing. It should therefore be as easy as possible to work with them, and mostly it is. for a in A is totally general; eachindex(A) is at least as simple as 1:length(A), certainly easier to type; axis(A, 1) is simpler than 1:size(A, 1), etc. It’s when you don’t want to just loop over all elements that it gets a bit harder, then you have begin, end, firstindex, lastindex, first, last, …

But maybe there could be some very simple and universal way to get a one-based view into any AbstractArray? OffsetArray has no_offset_view, but should all AbstractArrays have a method like that? Then anyone who wants less hassle would just write

function foo(A0::AbstractArray)
    A = OneBasedView(A0)
    for i in 4:length(A)-3
        ...
    end
end

Could this be made even simpler? Could there be some way to facilitate automatic promotion to OneBased? A @one_based macro, a new keyword, for1? (just kidding, maybe.)

2 Likes

Actually, there’s a subtlety here, and this may not do what is intended.

julia> x = ones(-10:-1, -10:-1);

julia> for t in axes(x, 1)[4:end], i in axes(x, 2)[2:end]
           x[t, i] = 2x[t-3, i-1]
       end

julia> all(isone, x) # nothing changed
true

julia> axes(x, 1)[4:end] # we looped over empty ranges
4:3

julia> axes(x, 2)[2:end]
2:1

The axes of an OffsetArray are also offset, and this indexing may not work as expected.

2 Likes

Ouch. Yes, I typed this from my phone, so dropped testing it :frowning:

Certainly there is no AbstractTuple at the moment but, given the flexibility of the language, there is no reason why AbstractTuples and OffsetTuples could not exist in the future. Should the use of these become widespread then this could cause disruption to the existing code base. Much of the discussion so far has concerned AbstractArrays but these sorts of problems could arise in the future with any of the abstract types. I guess this is one of the issues that Yuri was concerned about.

1 Like

No, this would require changes in Base itself (or even Core?), the type hierarchy with Any → Tuple is hardcoded, and you cannot wedge AbstractTuple in there from the outside. It is not something that is possible to do from a package, or somehow from ‘the outside’, so it’s not related to the flexibility of the language.

If you allow for changes to the core language, then of course anything could happen, but that’s obvious.

2 Likes

I have missed AbstractTuple as a way to let concrete Tuples, NTuples and NamedTuples more easily play in the same sandbox[s]. One advantage would be an easily seen uniform convention for sequential indexing of the ith element of a Tuple, or the ith value of a NamedTuple. Bringing NamedTuples closer in devthink to Tuples elaborated with position-attached names (symbols) would make working flexibly with them easier for many imo.

julia> NTuple{2, Int} === Tuple{Int, Int}
true

NTuple is just an alias.

1 Like

As someone who was in the “1-based to rule them all” camp for a while during that discussion, I can totally see the appeal of separating OffsetArrays somehow. But I came to realize some things:

  1. There’s no decree to use or support OffsetArrays, and we can totally write 1:n syntax in that case. We even have the require_one_based_indexing function to check inputs. Funnily enough, if there’s a custom array type that isn’t 1-based, OffsetArrays can make it 1-based to be compatible with your 1-based code.

  2. OffsetArrays are actually important. You should see the cool examples on the Julia blog post introducing them, but it’s not hard to run into math where starting at 1 doesn’t make sense. We’d have to do offset calculations to get to the index we need anyway, why not make a wrapper type do it automatically?

  3. Why shouldn’t OffsetArrays work in AbstractArray code? Many functions really could work with any indices, and some functions currently marked with require_one_based_indexing actually could work with offset indices, just require that all inputs share compatible ones. Incompatible indices can just throw the existing DimensionMismatch error.

  4. We definitely should make a docs section or tutorial that teaches people right away how to write “generic” indices; people might not be as good as indexing arrays as they think. (Even the devs! The OffsetArray discussion revealed some overflow bugs when indices or related values get too close to typemax/typemin.) It’s not a big deal when you’re just indexing from begin to end (and in that case, just call eachindex(A) or axes(A, i)), but it gets real important when you start doing fancy stuff like indexing the “middle”, skipping every other index, or going backwards. Obscure errors happen when different key quantities that happen to have the same values are confused with each other, like an n meaning the last index vs. an n meaning length. In 0-based indexing, this is actually more annoying because there are invisible 0+ everywhere that you have to spot to edit the indexing order. begin/end syntax could be improved to make these distinctions easier, and even if you don’t do offset indices you’d reach for the generic syntax just for clarity.

10 Likes

Yes, Certainly. Additionally, I have found it a clearly readable way of saying
“All the elements present are of type T, and there are N of them”, the first part is obvious enough with Tuple{Int, Int, Int, Int, Int, Int, Int, Int}, the second less so.

But then you don’t need AbstracTuple to ‘let them play in the same sandbox’?

1 Like

Rather than doing this in a function in your normal code or in a standalone test in a testrunner I rahter see this check done by a linter/compiler via the type system.

In general that’s the evolution that you see in programming languages over the decades. In the beginning you have languages with pretty weak type systems (e.g. C) so if you wanted to enforce (complex) constraints you had to do it in runtime code. Later on type system got more sofisticated and a lot of checks could be offloaded to that (e.g. ML, Haskell, Scala, F#,…)

There’s a saying (usually for static vs dynamic languages but applies more widely): every type check is a bunch of tests you don’t have to write (or can forget to write).

about 4:
If you’re just going loop over all elements I find “for e in A … end” the better way than eachIndex and then indexing. It’s just nice syntactic sugar that abstracts all that stuff and on the plus side also column based vs row based matrices or even if it’s a vector, matrix or whatever iterable collection.

1 Like

In other words, the iteration interface. I do see people unnecessarily iterating over indices just to getindex one array only once, good to point out. Iteration isn’t good enough when you need setindex!, though.

“With the introduction of OffsetArrays.jl AbstractArrays now confounds abstraction of element data type with the abstraction of indexing”

Actually the abstraction wasn’t about element type (that’s covered by generic type parameters) but was more about underlying storage type like BLAS matrix, distributed array, off-heap (on-disk) array,…

I agree with the confounding now with indexing and trouble of (retro)fitting the type hierarchy. If you wanted to shove storage type & indexing scheme into a type hierarchy you’ll have a multiplier effect and hence a lot of leaf nodes. A better way is to go more general and go with a type graph: i.e. have a type implement multiple traits/interfaces (yes here are traits again!). So you would have a trait hierarchy for storage type and one for indexing and a type would pick one from each.

2 Likes

Yes, and also about structure or properties, like symmetry and diagonal, or normalization, etc.

Yes it can be about many aspects of Arrays! The point is it now it is all thrown together into one hierarchy and a single type has many aspects which are not always obvious from the type name. When all that stuff would be refactored into separate traits / trait hierarchies one could compose each aspect indepently and have the type checker also validate each aspect indepently. That’s really powerful stuff!

@jishnub

so it would be something like this?

for t in first(axes(x, 1))+3:last(axes(x, 1)), i in first(axes(x, 2)+1:last(axes(x, 2))

This could be

firstindex(x, 1)+3:lastindex(x, 1)
1 Like

yes, that’s a lot better. thanks

I think you’re painting things as excessively binary where you choose exactly one of correctness or composability. Having just those options is quite dangerous for Julia’s reputation because it will cause more people to think “Julia doesn’t care about correctness”.

I would focus on several dimensions where there’s more options:

  1. Which code should be composable?
  2. Which kinds of bounds on composition should exist?

On (1), I would encourage the community to not tell every new Julia programmer that composability is the highest virtue. Indeed, I think high composability is not the right default goal for user code and perhaps not the right goal for most packages either.

And (2) is also important: the style guide I quoted can thought of as saying that there should be no bounds on function arguments, but I suspect if pushed it would be refined to “only exclude types known to fail”. But it would be entirely reasonable to instead default to only allowing types known to work.

6 Likes