Does `AbstractArray` itself need to support indexing beyond 1-based?

I was thinking about the thread (partially) about the incompatibility of some existing AbstractArray methods with OffsetArrays, and it got me wondering why AbstractArray was extended beyond 1-based indexing. Sticking to 1-based indexing has worked for most users in practice, and this is reflected in many other languages not supporting arbitrary starting indices. Although Fortran supports arbitrary bounds, even it has a 1-based default and convention.

The argument I’ve heard is that some algorithms are simplified if an input array can have 0-based or rotated indices. However, it seems to me that it’s enough to wrap 1-based arrays in OffsetArray (what happens in practice) to pass into such algorithms, and use no_offset_view to get the 1-based array.

With this, only OffsetArrays would require writing the more generic axes(A, i), eachindex(A), LinearIndices(A), begin, firstindex(A i), end, lastindex(A, i). For the vast 1-based majority of AbstractArrays, OneTo axes and UnitRange/StepRange subsets are easier to write and more intuitive in practice. Even OffsetArray itself needs to be more complicated to wrap non-1-based arrays. Also, instead of having to check if AbstractArray methods require_one_based_indexing and can’t accept an OffsetArray, the separation of okay-to-be-1-based algorithms and sometimes-cannot-be-1-based algorithms will be fully reflected in the type system, AbstractArray{T,N} versus OffsetArray{T,N,AA<:AbstractArray{T,N}} # no supertype.

So am I missing some advantages to the AbstractArray interface’s ideal support for offset indices over a hypothetical separation of OffsetArray from a not-so-hypothetically 1-based AbstractArray?

1 Like

I’m not sure if I understand your proposal so my reply might not address your questions.

OffsetArray is only one of the many array types that might have non-1-based indexing, thus dispatching my_alg(::OffsetArray) is not a good solution.

Maybe letting AbstractArray contain static information in its types is a good idea (AbstractArray{T,2,(1,1)} where (1,1) indicates the offsets), but there will be too difficult to raise this discussion in Julia and get it merged – too many things will be involved to embrace this new interface. Thus I don’t think this will happen.

Even if we’ve managed to work through it, we’ll eventually find that parametrizing the offset values might not be very helpful in practice (speaking of performance):

Oh, and the secret to supporting generic programming without much headache is to orthogonalize your design. The Base.require_one_based_indexing is exactly such a tool to help orthogonalization.

Well the imagined OffsetArray and AbstractArray would be quite a bit more rigid than they are now. AbstractArrays would always have 1-based axes, and OffsetArray is only a parametric wrapper type to tack on offset indices; they would be orthogonal concepts that dispatch to separate methods and require type conversion to work together, much like the linked docs described. It seems preferable to design the orthogonality in different documented types than in errors thrown by require_one_based_indexing in some heavily nested internal callee method.

There wouldn’t really be room for other concrete offset array types. Definitely a cost of making code for AbstractArray simpler and 1-based, but seems okay at this stage considering we still can’t matrix-multiply OffsetArrays.

It could dispatch to methods intended for very specific offsets, but I also doubt that it’s useful. The examples in the blog post I linked tend to treat offsets more like instance information (different rotations and shifts).

I see, this was discussed in Add an abstract type signaling e.g. OffsetArray safe to use (also add badges to packages) · Issue #284 · JuliaArrays/OffsetArrays.jl · GitHub already and I believe it’s not a doable approach.

The question about this approach is what should Diagonal be? Is it struct Diagonal <: AbstractArray, or struct Diagonal <: AbstractOffsetArray? (Anyway, restricting the concept of AbstractArray will be a breaking change and thus won’t happen easily in the foreseeable future)

One could consider proposing a holy trait type for this if this ever be wanted. For instance:

struct GenericIndexing end
struct OneBasedIndexing end

offset_type(::Array) = OneBasedIndexing()
offset_type(::OffsetArray) = GenericIndexing()
offset_type(A::Diagonal) = offset_type(parent(A))

I opened an issue for this in trait type to distinguish generic offset array and one based indexing array · Issue #326 · JuliaArrays/ArrayInterface.jl · GitHub

2 Likes

Hm, that is tricky. If we stick with Diagonal’s design to stand in for a square Matrix, it must subtype AbstractMatrix and, per my scenario, remain 1-based. So, an offset diagonal has to be OffsetArray wrapping a Diagonal, not the other way around. I do see an immediate problem: when I need to work with offset indices instead of the parent 1-based indices, like adding a rotated Diagonal image to another, OffsetArray code would inefficiently access all the 0s in the full matrix, just like when Diagonal falls back to AbstractMatrix methods.

I can’t really imagine how the existing Diagonal methods could be used for the rotated image addition, though. Even if all the require_one_based_indexing calls are removed, the Diagonal methods treat the parent Vectors as having the same indices and do elementwise operations. So right now, adding 2 Diagonals, which must be the same size, returns a Diagonal of the same size. 2 rotated diagonals would not usually be aligned, so an addition method must instead separately iterate each parent Vector and add to their offset indices in an output Matrix.

For a simpler example, we consider elementwise operations on 2 diagonals with different linear offsets e.g. 0-based + 1-based. Actually, this errors even with currently supported OffsetArrays, so Diagonal’s assumption that input matrices share indices is consistent and sensible. Extending Diagonal code to OffsetArrays to only allow inputs with equal offset indices doesn’t really add any feature compared to working on the 1-based parent arrays. It just seems like Diagonal matrix math and OffsetArray methods can’t compose, even if OffsetArray can wrap a Diagonal for more efficient memory usage.

Well, yes that Diagonal might be a case where we can “validly” restrict ourselves to 1-based indexing only. But it indicates a very broad range of array types that faces this issue. I could name a few, MappedArray from MappedArrays.jl, ReinterpretArray from reinterpret.

If I understand correctly. You’re proposing to adjust the type of array hierarchy to something like

But this will be a breaking change because the semantic of AbstractArray get narrowed, thus it won’t happen until Julia 2.0, which will be a long future from now. And you have to convince Julia dev team that this is a good change – this is the most difficult part of the work.

A non-breaking and more-or-less equivalent version is this

But you still have to convince everyone in the entire Julia ecosystem to build the consensus to subtype AbstractOneBasedArray instead of AbstractArray. – This is again, the most difficult part of the work.

That’s not all. Note that in the question mark in the Figures v1 and v2, you can’t find a good place for arrays like ReinterpretArray. For instance:

A = [1, 2, 3, 4
reinterpret(reshape, UInt8, A) # (1:4, 1:4)

Ao = OffsetArray(A, -1)
reinterpret(reshape, UInt8, Ao) # (0:3, 0:3)

You’ll find that neither AbstractOneBasedArray nor AbstractOffsetArray is a good solution because whether ReinterpretArray is a one-based indexing array or generic offset array totally depends on its wrapped data type. For safety we can set it to AbstractOffsetArray but it losses some information.

This is where the holy trait trick offset_type works – it dynamically infers the types. Thus if you have an algorithm, say

function my_alg(A, B, C)
    ...
end

you’ll only need to reform it to:

function my_alg(A, B, C)
    static_require_one_based_indexing(A, B, C)
    ...
end

static_require_one_based_indexing(As...) = foreach(static_require_one_based_indexing, As)
static_require_one_based_indexing(A) = _static_require_one_based_indexing(offset_type(A))

_static_require_one_based_indexing(::GenericIndexing) = error("generic offset array is not supported")
 _static_require_one_based_indexing(::OneBasedIndexing) = nothing

That’s all, and eventually, we find out that this is just an almost equivalent version to the current runtime version: Base.require_one_based_indexing. What’s the benefit? It saves a few runtime computation to check on all(first.(axes(A)) == 1).

Does this worth the change? I’m not sure, at least I haven’t seen any practical scenario where this axes check becomes a performance bottleneck.

1 Like

Agreed, this is just language design speculation, I don’t have the expectation or will to become part of the core development team to design v2 over such a minor aspect of the language when there’s lots of much more important things should develop in v1.

The type hierarchy part is a bit off, but I like the clearer terminology and will use it to clear things up. Almost all arrays will be 1-based and subtype AbstractOneBasedArray, which seems to be how arrays are used in practice and what many methods now assume. AbstractArray and require_one_based_indexing as they are now would no longer exist. OffsetArray only wraps 1-based arrays, and it is not a subtype of anything (except Any). There are AbstractOneBasedArray methods that align axes (which are counted from 1 instead of firstindex), and there are OffsetArray methods that unalign axes with offset indices; the logic is that if the methods aren’t interchangeable, then the types should be in separate branches in the type hierarchy.

In algorithms where offset indices are not used to represent unalignment and are aligned by firstindex to use the index values for computations, the same thing could be accomplished with a separate 1-based CartesianIndices of the values. It’s also more flexible because offset indices are restricted to steps of +1.

In my scenario, ReinterpretArray would be a 1-based AbstractOneBasedArray that only wraps other AbstractOneBasedArrays. If you were to define a reinterpret(::OffsetArray), it would return a new OffsetArray that wraps the reinterpreted parent array of the input.

Your holy traits that guarantee a static check of 1-based indexing is very cool and far less breaking, so I’ll be glad to see it implemented in v1. But the status quo remains that it’s uncertain whether an AbstractArray method would accept offset arrays at all or whether it would accept offset arrays with equal indices. The only way to tell is to throw an OffsetArray into a method, and the error thrown by require_one_based_indexing often happens in a deeply nested internal method call. Using the type system to separate aligned indices algorithms and offset indices algorithms still seems like a better alternative to trial and error, though the point of the post is to check if I’m missing something.

I think such properties, such as staticness, memory-contiguity, or indices, are best represented through traits, as these may be orthogonal and every combination may not necessarily representable in the form of a type-tree without an explosion in the number of types.

1 Like

Maybe? I wish I was better at figuring out if a property is better as a hierarchal type or an auxiliary holy trait for method dispatch. ReinterpretArray, OffsetArray, and Diagonal seem to require types, though; those are not auxiliary properties, and they wrap parent arrays with other data fields.

Can’t escape an explosion in number of methods if you combine many holy traits, e.g. f(::StaticStyle, ::ContiguityStyle, ::IndexingStyle, input), so orthogonal design is still important.

Using holy traits doesn’t necessarily mean you need to dispatch every trait on the same level.

Take my above static_require_one_based_indexing as an example, a normal holy trait as you know is to implement the algorithm as

my_alg(A, B, C) = my_alg(infer_offset(A, B, C), A, B, C)
my_alg(::OneBasedIndexing, A, B, C) = ...
my_alg(::GenericIndexing, A, B, C) = error(...)

but as you already know, this can be reformed to

function my_alg(A, B, C)
    static_require_one_based_indexing(A, B, C)
    ...
end

I believe the Julia compiler will optimize the overhead away so that this will be essentially the same as my_alg(A::AbstractOneBasedArray, B::AbstractOneBasedArray, C::AbstractOneBasedArray).

In practice, the orthogonality often happens to multiple levels, so you’ll implement a lot of small functions like the static_require_one_based_indexing to handle different properties. Eventually, the top-level my_alg can be implemented generically without adding too many type annotations at the top level.

I mean, if one ever tries to dispatch my_alg on offset trait, he should just embrace utils such as broadcasting, for ... in axes(...), and others that work on generic offset arrays. This makes the codes more orthogonal.

2 Likes

This is an interesting idea. If we ever move OffsetArrays to Base, we might declare some first-class support on OffsetArray and I’d be very happy to see more functions support OffsetArray.

Using traits to restrict arguments by multiple properties, and statically, in a chain from the primary types of the inputs A,B,C is great from an implementation standpoint. But I do wish these annotations were closer to surface level, and you actually provided a good example. See, it would actually be a lot better if the static_require_ methods were always visible at the first lines of surface level methods like my_alg(A, B, C). But in practice, these require_ lines are in internal methods a dozen calls deep, like _generic_matmatmul!, to reduce redundancy, I suppose. If I can’t see the restrictions annotating method arguments in the docs, I would at least want to be able to see it first thing when I click the [source] link.

1 Like

This is a good point. I’m not sure how I feel about this or have a better programmable solution. This makes it opaque and users need to try to guess the correct input format.

I myself try to add enough documentation, examples, and tests to guarantee this. At least, require_ methods eagerly throw errors on absolutely unsupported inputs and that’s a good message already.

Generally speaking, one of the biggest sell-point of Julia now is its composability without sacrificing the performance, and if a proposal (e.g., restricting potential input types) makes it less composable, it is less likely to be part of Julia.

1 Like

Currently, OffsetArrays aren’t the only non-1-based arrays: any SubArray, where the indices are themselves offset, produces an offset array. In fact, an OffsetArray type is perhaps redundant for most use cases. Arguably, therefore, we only need an offset AbstractUnitRange type (similar to OffsetArrays.IdOffsetRange).

julia> using OffsetArrays

julia> a = reshape(1:4, 2, 2)
2×2 reshape(::UnitRange{Int64}, 2, 2) with eltype Int64:
 1  3
 2  4

julia> v = @view a[OffsetArrays.IdOffsetRange(values = 1:2, indices = 3:4), :]
2×2 view(reshape(::UnitRange{Int64}, 2, 2), OffsetArrays.IdOffsetRange(values=1:2, indices=3:4), :) with eltype Int64 with indices OffsetArrays.IdOffsetRange(values=3:4, indices=3:4)×Base.OneTo(2):
 1  3
 2  4

julia> axes(v) .|> UnitRange
(3:4, 1:2)

julia> v == OffsetArray(a, 3:4, :)
true

Your suggestion would presumably restrict SubArrays to 1-based indices as well, with OffsetArray(::SubArray) being the type with offset indices?

1 Like

Correct.

This is really neat composition though. You gave a SubArray an IdOffsetRange version of the parent array’s axis, and the SubArray’s own axis is offset accordingly. I’ve never seen this happen before because SubArray axes are 1-based by default EDIT: turns out it’s not a default so much as computing the axes of an axis, and the axes of UnitRanges and StepRanges just happen to be OneTos.

julia> v.indices # subarray uses these indices of the parent array
(OffsetArrays.IdOffsetRange(values=1:2, indices=3:4), Base.Slice(Base.OneTo(2)))

julia> axes(v) # indices of the subarray
(OffsetArrays.IdOffsetRange(values=3:4, indices=3:4), Base.OneTo(2))

julia> axes(OffsetArrays.IdOffsetRange(values=1:2, indices=3:4))
(OffsetArrays.IdOffsetRange(values=3:4, indices=3:4),)

This was a great talk that touched on a lot of details with very cool proposed approaches, but I’m resolving the thread because I’m now more convinced about equal support for offset indices. What really tipped the scale was this comment pointing out that a DimensionMismatch error can be thrown for unmatched axes in general, and I figured it’s not much on top of DimensionMismatch errors being thrown for 1-based axes of different lengths.

I still think it would be easier to write (n÷2+1):n than ((end-begin+1)÷2+begin):end (maybe end-begin+1 could be given its own keyword to correspond to length?), but features have tradeoffs and require_one_based_indexing is always an option.