Is OffsetArrays.jl a poison pill?

There have been recent discussions of issues related to OffsetArrays.jl. What I hope to contribute here is the perspective of an end-user (non-CS person), which was not the focus of the earlier threads.

The need to accommodate OffsetArrays has led to a rather general but also complicated syntax. This is fine for Base and packages, but tend to scare off newcomers.

For Julia 2.0 or whatever, I would like to see something like forNEW t = 4:size(x,1); i = 2:size(x,2);x[t,i] = x[t-3,i-1];... work whether x is a traditional matrix or from OffsetArrays.

3 Likes

If you are an end-user, you are free to do 1:length(x) whenever you want.

Being an end user means you in general get to control what packages you use and what inputs you give to functions. You don’t need to worry about OffsetArrays.jl compatibility until you are doing package development.

18 Likes

You can do

for t in axes(x, 1)[4:end], i in axes(x, 2)[2:end]
    x[t, i] = x[t-3, i-1]
end

Maybe someone can dream up a terser syntax?

5 Likes

Eachindex and friends are probably the right way to do this, and they come with nice benefits like not needing inbounds most times and dealing with odd indices

4 Likes

It is fine to use “simpler” syntax if you are using concrete types that you know. The lesson from OffsetArrays is that one needs to be extremely careful with abstract types. If one accepts an AbstractArray you should either test your code’s assumptions or convert it to a concrete type that you do know.

julia> using OffsetArrays

julia> OA = OffsetArray(rand(Int8, 10, 10), -5:4, -5:4)
10Ă—10 OffsetArray(::Matrix{Int8}, -5:4, -5:4) with eltype Int8 with indices -5:4Ă—-5:4:
    4  -110  104   -52    22  117   -28   110  -87   -44
  122   107   32  -127   -65   16  -120   -76   76   -10
 -122    82  113  -120   111   40    -5  -122  -22    71
  -53   -72    2   -19    68  -86   -22    53   65   -54
  -43   -29    6    92   -85   55   126   -49  -30   -18
  121   -87   82   -31  -121  -94   117    88   37   -54
  -70   -27   83   -41   -19  -81   -63   -16   52  -103
  -30   -56   46     6    23   80    97    34    4    41
  -69    26   58    81   119   96    24   117  -85   -18
   66   -21   69    63  -100  100   126  -122  -88     0

julia> A = OffsetArrays.no_offset_view(OA);

julia> typeof(A)
Matrix{Int8} (alias for Array{Int8, 2})

julia> for t = 4:size(A,1), i = 2:size(A,2)
           A[t,i] = A[t-3, i-1]
       end;

julia> OA
10Ă—10 OffsetArray(::Matrix{Int8}, -5:4, -5:4) with eltype Int8 with indices -5:4Ă—-5:4:
    4  -110   104   -52    22   117   -28   110   -87   -44
  122   107    32  -127   -65    16  -120   -76    76   -10
 -122    82   113  -120   111    40    -5  -122   -22    71
  -53     4  -110   104   -52    22   117   -28   110   -87
  -43   122   107    32  -127   -65    16  -120   -76    76
  121  -122    82   113  -120   111    40    -5  -122   -22
  -70   -53     4  -110   104   -52    22   117   -28   110
  -30   -43   122   107    32  -127   -65    16  -120   -76
  -69   121  -122    82   113  -120   111    40    -5  -122
   66   -70   -53     4  -110   104   -52    22   117   -28

What would be the point of OffsetArrays then? I create an OffsetArray specifically because I want to index it at offset indices.

5 Likes

If it was only OffsetArrays, we would probably just not support that package. It’s just the simplest example of custom indexing.

1 Like

100% agree. But my hope would be that it is acceptable that you don’t need to worry about that compatibility even if you are doing package development. OffsetArrays compatibility could be something advertised rather than assumed.

Adding test cases for those is an enormous burden to put on package developers with limited resources who may have algorithms that are easier to just assume has vectors starting 1.

1 Like

What might be useful would be for BaseArrays to be defined as follows.

julia> const CoreArrays{T,N} = Union{subtypes(Core, AbstractArray{T,N})...}
Union{Core.Compiler.AbstractRange{T}, Core.Compiler.BitArray{N}, Core.Compiler.ExceptionStack, Core.Compiler.MethodList, DenseArray{T, N}, Core.Compiler.LinearIndices{N, R} where R<:Tuple{Vararg{Core.Compiler.AbstractUnitRange{Int64}, N}}} where {T, N}

julia> const BaseArrays{T,N} = Union{subtypes(Base, AbstractArray{T,N})...}
Union{AbstractRange{T}, BitArray{N}, Base.ExceptionStack, Base.MethodList, LinearIndices{N, R} where R<:Tuple{Vararg{AbstractUnitRange{Int64}, N}}, Base.LogicalIndex{T}, Base.ReinterpretArray{T, N, S} where S, Base.ReshapedArray{T, N}, Base.SCartesianIndices2, SubArray{T, N}, CartesianIndices{N, R} where R<:Tuple{Vararg{OrdinalRange{Int64, Int64}, N}}, PermutedDimsArray{T, N}} where {T, N}

julia> const BaseAndCoreArrays{T,N} = Union{CoreArrays{T,N}, BaseArrays{T,N}}
Union{AbstractRange{T}, BitArray{N}, Base.ExceptionStack, Base.MethodList, Core.Compiler.AbstractRange{T}, Core.Compiler.BitArray{N}, Core.Compiler.ExceptionStack, Core.Compiler.MethodList, DenseArray{T, N}, LinearIndices{N, R} where R<:Tuple{Vararg{AbstractUnitRange{Int64}, N}}, Base.LogicalIndex{T}, Base.ReinterpretArray{T, N, S} where S, Base.ReshapedArray{T, N}, Base.SCartesianIndices2, SubArray{T, N}, Core.Compiler.LinearIndices{N, R} where R<:Tuple{Vararg{Core.Compiler.AbstractUnitRange{Int64}, N}}, CartesianIndices{N, R} where R<:Tuple{Vararg{OrdinalRange{Int64, Int64}, N}}, PermutedDimsArray{T, N}} where {T, N}

julia> foo(x::BaseAndCoreArrays) = x[1]
foo (generic function with 1 method)

julia> foo([4,5,6])
4

julia> foo(x::BaseAndCoreArrays) = x[1:length(x)]
foo (generic function with 1 method)

julia> foo([4,5,6])
3-element Vector{Int64}:
 4
 5
 6

Then you could happily define foo(A::BaseAndCoreArrays) without worrying about “strange” arrays.

julia> foo(x::BaseAndCoreArrays) = x[1]
foo (generic function with 1 method)

julia> foo([4,5,6])
4

julia> foo(x::BaseAndCoreArrays) = x[1:length(x)]
foo (generic function with 1 method)

julia> foo([4,5,6])
3-element Vector{Int64}:
 4
 5
 6
2 Likes

We also have arithmetic with begin and end. For example,

x[begin+3:end, begin+1:end] .= x[begin:end-3, begin:end-1]

should do the same thing.

24 Likes

If you want to develop a package and rely on 1-based indexing, why not just test that the matrix is 1-based:

onebased(a::AbstractArray) = all(isone, (firstindex(a,d) for d in 1:length(size(a))))

Then in your package’s functions that accept abstract arrays, put something like this:

function f(x, y, z, w::AbstractArray)
    onebased(w) || error("This function doesn't support other than 1-based arrays')
    ....
end

This will prevent the possibility of silent errors until you have the time and energy to upgrade the code for more general array indexing.

Edit: As pointed out by Benny below there already exists a function Base.require_one_based_indexing that will do equivalent checking. It is better than my proposed function in that it can accept any number of abstract matrix or vector arguments and will throw an ArgumentError if any of the arguments does not have one-based indexing. Note that it is not exported from Base and is not documented. Should it be?
It is documented here.

3 Likes

That would require every single function in every single package out there to have that check? And since it can’t be relied on, end users interested in offsetarrays would still need to read the source-code of dependencies…

The alternative approach is just to admit that arbitrary packages don’t support OffsetArrays and not put the pressure on stretched package developers to support a feature that most people will never use - or who can use it with careful package selection when it is appropriate.

Then people who put in the time to support offsetarrays or where it is only a minor inconvenience to do so can just advertise it. At this point, I don’t think anybody should assume that arbitrary packages support offsetarrays without testing it themselves.

7 Likes

The check would only be needed in the outermost, exported functions that are part of the published API of one’s package. I often perform checks for the appropriateness of the inputs provided to my user-facing functions. This is just one more possible check that could be added if the developer doesn’t have time for a full code review but wants to maximize code robustness.

I’m sorry if I seemed to be trying to pressure anyone–that was not my intent. I tried to provide another easy alternative for package developers who are interested in this topic.

3 Likes

While OffsetArrays.jl is a very valuable and useful package I also feel uneasy about its effect on the ecosystem.

How would we feel if an OffsetTuples.jl package was introduced (which, while we are at it, allowed for multidimensional indexing too)?

With the introduction of OffsetArrays.jl AbstractArrays now confounds abstraction of element data type with the abstraction of indexing. These two abstractions are orthogonal and it is this that complicates any attempt to form a hierarchy of array abstractions.

I would really like abstractions of element data type and abstraction of indexing to be separate.

5 Likes

But usually when a newcomer asks about this, the response is “don’t do 1:length(x) or the world will come to an end.” The same holds for other best practices. We could be more nuanced when answering these questions.

8 Likes

The part of Yuri’s criticism I would hope the community most internalizes is the idea that some of the current best practices are somewhat problematic and seem to implicitly deprioritize correctness.

10 Likes

I’m unclear what “current best practices are somewhat problematic and seem to implicitly deprioritize correctness”. I could interpret this statement meaning opposite concepts in practice. Could you help me to internalize what you mean?

What I see as missing is an abstract subtype of AbstractArray where the interface is that of Array, including one-based indexing. We could call it AbstractOneBasedArray. I think many thought this is what AbstractArray was.

6 Likes

I think some of the best practices here are problematic: GitHub - jrevels/YASGuide: Yet Another Style Guide For Julia

For example:

  • Dispatch type constraints should be used chiefly for dispatch, not for artificially restricting method signatures to “known” types, or merely for documentation.

In practice, “known types” is likely equivalent to “tested types”, which suggests the preference is to avoid restricting functions to the tested surface area.

6 Likes

That makes some sense.

You consider introducing an intermediary abstract array type, say, AbstractSimpleArray which always begins with index 1 and indices are separated by one(T) [or oneunit(T) for Ints they are the same].

What you are constraining is less the Array abstraction than its indices convention / constraint / requirements. There is an AbstractUnitRange{T} which is

Supertype for ranges with a step size of oneunit(T) with elements of type T. UnitRange and other types are subtypes of this.

and a concrete type Base.OneTo:

help? > Base.OneTo
Define an AbstractUnitRange that behaves like 1:n, with the added
distinction that the lower limit is guaranteed (by the type system) to
be 1.

The AbstractSimpleArray is an AbstractArray with indices that conform to Base.OneTo. Of course, one may want a 1 based array that takes 2 or n steps from index to index – however that is no longer simple. And there are subtypes of AbstractArray{T,N} that would need to remain as subtypes(AbstractArray) and also to be sub-specialized with unique names as subtypes(AbstractSimpleArray) or Better it seems, with a range categorization constraining parameter
[though this need not be breaking, great care is needed to insure that].

Here is a direction, without much consideration for what-would-happen.
There is a better refactoring than using Range types as params.

abstract type AbstractArray{T,N} end
abstract type AbstractSteppedArray{T,N,R} <: AbstractArray{T,N} end
abstract type AbstractSimpleArray{T,N} <: AbstractSteppedArray{T,N,Base.OneTo} end

 struct SimpleArray{T,N} <: AbstractSimpleArray{T,N}
         values::Array{T,N}
end

Base.size(x::AbstractSimpleArray) =
  isempty(x.values) ? nothing : size(x.values)
Base.get(x::SimpleArray{T,1}, i::Integer, default::T=zero(T)) where {T<:Number} =
  getindex(x.values, i, default)
Base.get(x::SimpleArray{T,1}, i::Integer, default::T) where {T} =
  getindex(x.values, i, default)
Base.getindex(x::SimpleArray{T,1}, i::Integer) where {T<:Number} = 
   1 <= i <= length(x.values) ? getindex(x.values, i) : nothing

# dispatch catches disallowed range[s] if used to index
# ...

additional info from the docs:

If a type is defined as a subtype of AbstractArray , it inherits a very large set of rich behaviors including iteration and multidimensional indexing built on top of single-element access. See the arrays manual page and the Julia Base section for more supported methods.

A key part in defining an AbstractArray subtype is IndexStyle. Since indexing is such an important part of an array and often occurs in hot loops, it’s important to make both indexing and indexed assignment as efficient as possible. Array data structures are typically defined in one of two ways: either it most efficiently accesses its elements using just one index (linear indexing) or it intrinsically accesses the elements with indices specified for every dimension. These two modalities are identified by Julia as IndexLinear() and IndexCartesian(). Converting a linear index to multiple indexing subscripts is typically very expensive, so this provides a traits-based mechanism to enable efficient generic code for all array types.

This distinction determines which scalar indexing methods the type must define. IndexLinear() arrays are simple: just define getindex(A::ArrayType, i::Int).

1 Like

But if functions are restricted to known/tested types, across the ecosystem, what will happen to composability? Is your core argument that it is better for the language and community to sacrifice composability in general? Should generic programming not be considered good practice, or am I reading too much into this comment?

4 Likes

This doesn’t quite work:

1.7.2> foo(1:3)
"Success"

1.7.2> foo([1,2,3])
ERROR: MethodError: no method matching foo(::Vector{Int64})

But even if the list was complete for arrays in Base, this would mean no support for Symmetric or Diagonal matrices, no SparseArrays, no SVector, etc. etc. It would be much more limiting than you probably want.

5 Likes