Stubborn 1-based LinearIndices for the custom type with non-zero based indexing

Greetings and apologies if this is very basic.

I am trying to create a custom type subtyped from AbstractArray to dispatch on it in place of a dense array at times. I’ve been following instructions in the manual for specifying arrays with custom indices. I can make dereferencing and iteration work, but can’t get pairs() to function properly, it stubbornly produces unit-based indices, but unlike enumerate(), also pairs them with the non-existant elements.

Here is my MWE

import Base: length,size,iterate,getindex,axes, has_offset_axes
struct myrange <: AbstractVector{Int}  
    LStart::Int
    LStop::Int
end
IndexStyle(::Type{myrange}) = IndexLinear()
length(r::myrange) = r.LStop-r.LStart+1
size(r::myrange) = (length(r),)
function Base.getindex(r::myrange,i::Integer)
    @boundscheck r.LStart≤ i ≤ r.LStop || throw(BoundsError(r,i))
    i
end
firstindex(r::myrange) = r.LStart
lastindex(r::myrange) = r.LStop

The manual says it’s better to create a custom class for the indices range, but suggests it’s possible to use UnitRange. This is the approach I follow as I don’t need anything fancier

@inline Base.axes(r::myrange) = (UnitRange(r.LStart,r.LStop),)
Base.has_offset_axes(::myrange) = true

I also overloaded similar() as suggested in the manual, but this does not seem to matter.

Now, this seems to work:

julia> axes(z)[1]
4:8

But when attempting to use the class:

julia> LinearIndices(axes(z))
5-element LinearIndices{1,Tuple{UnitRange{Int64}}}:
1
2
3
4
5

and consequently

julia> pairs(IndexLinear(), z)
pairs(::myrange{Int64}) with 5 entriesError showing value of type Base.Iterators.Pairs{Int64,Integer,LinearIndices{1,Tuple{UnitRange{Int64}}},myrange{Int64}}:

ERROR: BoundsError: attempt to access 5-element myrange{Int64} with indices 4:8 at index [1]

The post mortem reveals that pairs() has indeed got the wrong indices from LinearIndices:

julia> properties(pairs(IndexLinear(), z))
2-element Array{Tuple{Symbol,AbstractArray{T,1} where T},1}:
(:data, Integer[#undef, #undef, #undef, 4, 5])
(:itr, [1, 2, 3, 4, 5])

But curiously

julia> LinearIndices(axes(z)).indices
(4:8,)

One might think that overloading LinearIndices for the custom class is required, but do observe this working perfectly:

zz=OffsetArray(4:8,4:8)
LinearIndices(axes(zz))
pairs(IndexLinear(), zz)

@which LinearIndices(axes(zz)) shows that both myrange and OffsetArray get dispatched to the same (default) LinearIndices constructor (in Base at indices.jl:444)

So my questions are:

  • how is it possible that LinearIndices carries wrong indices despite explicitly containing the correct ones in the eponymous field? What is the difference between axes(::myrange) and axes(::OffsetArray) that causes the divergence?

  • what is the minimal way to modify my class to avoid the above behaviour?

I suspect there may be hints in the paragraph below, but I am not getting conversion errors, nor does this explain the mystery of LinearIndices(::myrange)

By this definition, 1-dimensional arrays always use Cartesian indexing with the array’s native indices. To help enforce this, it’s worth noting that the index conversion functions will throw an error if shape indicates a 1-dimensional array with unconventional indexing (i.e., is a Tuple{UnitRange} rather than a tuple of OneTo). For arrays with conventional indexing, these functions continue to work the same as always.

Sorry for not answering your questions directly.
Do you know about

Maybe the work has already been done.

And of course: WELCOME to this fine community!

Thank you Oheil,

I am familiar with OffsetArrays, as you can see I’ve used it as a comparator to diagnose the strange behaviour of LinearIndices. I cannot use the class for my purposes, because I don’t want a dense array, I need a generator (subclassed from AbstractArray) that iterates (this is easy) and produces correctly indexed pairs() (tricky). Internally, IdOffsetRange, the class for the OffsetArraysaxes, is quite complicated (even more so than CustomUnitRanges.URange) to cover lots of eventualities, and I was wondering whether I can use a much simpler UnitRange implementation. The manual suggests that it should be possible, but as you can see above, there are some hidden obstacles. Could it be that there are snags in Julia base (as seems to be suggested by the extended help below)?

Construction/coercion preserves the (shifted) values of the input range, but may modify the indexes if required by the specified types. 

!!! warning

In the future, *conversion* will preserve both the values and the indices, throwing an error when this is not achievable.

I did not want to go down the CustomUnitRanges route, because I didn’t have the time to understand the nature of the associated complications (“The reason is that this package’s range types should be private to the module that needs them; consequently you don’t want to define a module in the global namespace.”)

But it looks like eventually I will have to, if it’s not common knowledge among Julians.

Actually, CustomUnitRanges.URange is not of much help, at least, out of the box. Pairs() don’t error out, but produce wrong indexing:

julia> Iterators.Pairs(4:8, LinearIndices(URange(4,8)))
pairs(::UnitRange{Int64}) with 5 entries:
  1 => 4
  2 => 5
  3 => 6
  4 => 7
  5 => 8

Mystery deepens :confused:

This is precisely where you’re hitting the trouble. We’ve found that one of the core properties of axes is that they themselves must be similarly offset in order for things to function as you want it to!

In other words, a fundamental property of an array is that:

A[axes(A)...] == A

That’s simple enough when everything is 1-based, but another key property we’ve found helpful is that the axes of the result is simply the concatenated axes of the indices. In other words:

axes(A[I, J, K]) = (axes(I)..., axes(J)..., axes(K)...)

The interesting thing is that many other algorithms — even those not based on nonscalar indexing — “just work” once you’ve satisfied this.


Now, I still think you’ll have the most success just using OffsetArrays — it’s the package that represents the accumulated learnings of how to best define offset arrays that are as functional as possible.

An OffsetArray isn’t necessarily dense; they wrap and offset any AbstractArray subtype:

julia> r = 3:7
3:7

julia> ro = OffsetArray(r, -4)
3:7 with indices -3:1

julia> pairs(ro)
pairs(::OffsetArray{Int64,1,UnitRange{Int64}}) with 5 entries:
  -3 => 3
  -2 => 4
  -1 => 5
  0  => 6
  1  => 7

If you want to define custom behaviors, you can do so by specializing f(::OffestArray{T,N,<:AbstractRange}) — this is “piracy” if done outside of the OffsetArrays package, but I’d bet the optimizations could be accepted as PRs there. If you want to do something more robust and less pirate-y, just define your custom array (that’s 1-based) and use OffsetArrays to offset it — then extending methods with f(::OffsetArray{T,N,<:MyOneBasedRange}) will totally be safe.

11 Likes

@mbauman, thank you for your suggestions, your intuition is right:

We’ve found that one of the core properties of axes is that they themselves must be similarly offset in order for things to function as you want it to!

However, I would formalize it differently. I don’t think A[axes(A)...] == A is sufficient. In my experiments, the criterion for the axes() to work was

map(first, map(first∘axes, axes(z)) ) == map(first, axes(z))
map(last, map(last∘axes, axes(z)) ) == map(last, axes(z))

or, to make it slightly more readable in the 1-dimensional case,

first( Base.axes1( Base.axes1(z) ) ) == first( Base.axes1(z) )

(when Base.axes1() is appropriately overloaded; in other words, axes() should be idempotent in the sense defined above)

Please see below the two examples: one of working myrange implementation, in which A[axes(A)...] ≠ A , so it’s not necessary (though getindex() could be easily overloaded to make it hold), and a broken implementation in which A[axes(A)...] == A.

This works:

import Base: length,size,iterate,getindex,axes, has_offset_axes
struct myrange_working <: AbstractUnitRange{Int} 
    LStart::Int
    LStop::Int
end
function Base.getindex(r::myrange_working,i::Int64)
    @boundscheck r.LStart≤ i ≤ r.LStop || throw(BoundsError(r,i))
    i
end

Base.first(r::myrange_working) = r.LStart
Base.last(r::myrange_working) = r.LStop
@inline Base.axes(r::myrange_working) = (r,)
@inline axes1(r::myrange_working) = axes(r)[1]

z = myrange_working(4,8)
z[axes(z)...]  # 7:11
z[axes(z)...] == z  # **false**

map(first,map(first∘axes,axes(z)))== map(first,axes(z)) # **true**
map(last,map(last∘axes,axes(z)))== map(last,axes(z))  # **true**
first(Base.axes1(Base.axes1(z))) == first(Base.axes1(z)) # **true**

julia> println(LinearIndices(axes(z)))
[4, 5, 6, 7, 8]

julia> println(pairs(IndexLinear(), z))
Base.Iterators.Pairs(4 => 4,5 => 5,6 => 6,7 => 7,8 => 8)

julia> collect(z)'
1×5 LinearAlgebra.Adjoint{Int64,Array{Int64,1}}:
 4  5  6  7  8

This does not:

import Base: length,size,iterate,getindex,axes, has_offset_axes
struct myrange_broken <: AbstractUnitRange{Int}  
    LStart::Int
    LStop::Int
end

function Base.getindex(r::myrange_broken,i::Int64)
    @boundscheck r.LStart≤ i ≤ r.LStop || throw(BoundsError(r,i))
    i
end
function Base.getindex(r::myrange_broken, ur::UnitRange{Int64})
    @boundscheck r.LStart≤ ur.start && ur.stop ≤ r.LStop || throw(BoundsError(r,ur))
    ur
end

Base.first(r::myrange_broken) = r.LStart
Base.last(r::myrange_broken) = r.LStop
@inline Base.axes(r::myrange_broken) = (UnitRange(r.LStart,r.LStop),)
@inline axes1(r::myrange_broken) = axes(r)[1]

z = myrange_broken(4,8)
z[axes(z)...] # 4:8
z[axes(z)...] == z # **true**

map(first,map(first∘axes,axes(z)))== map(first,axes(z))  # **false**
map(last,map(last∘axes,axes(z)))== map(last,axes(z))  # **false**
first(Base.axes1(axes1(z))) == first(Base.axes1(z))  # **false**

julia> println(LinearIndices(axes(z)))
[1, 2, 3, 4, 5]

julia> println(pairs(IndexLinear(), z))
ERROR: BoundsError: attempt to access 5-element myrange_broken with indices 4:8 at index [1]

For my purposes myrange_working is sufficient, and I would prefer this minimalistic implementation to the relative complexity of OffsetArrays; avoiding type piracy is a bonus.

By the way, I think the manual section on Arrays with custom indices may need to be updated, as it omits the crucial bits needed for the custom AbstractUnitRange type to work: Base.first() and Base.last(). Is this a known issue?

I mean, sure, you can come up with all sorts of permutations where some things work and others don’t. I was just trying to provide some motivation for the general rule we use to unify arrays both offset and non.

A minimal but fully-functional identity-offset unit range implementation is here:

Note that this really isn’t a #user:first-steps sort of thing — you’re in the deep end of the internals at this point. The #user:first-steps answer is to use OffsetArrays. :slight_smile:

2 Likes