Is OffsetArrays.jl a poison pill?

I’m unclear what “current best practices are somewhat problematic and seem to implicitly deprioritize correctness”. I could interpret this statement meaning opposite concepts in practice. Could you help me to internalize what you mean?

What I see as missing is an abstract subtype of AbstractArray where the interface is that of Array, including one-based indexing. We could call it AbstractOneBasedArray. I think many thought this is what AbstractArray was.

6 Likes

I think some of the best practices here are problematic: GitHub - jrevels/YASGuide: Yet Another Style Guide For Julia

For example:

  • Dispatch type constraints should be used chiefly for dispatch, not for artificially restricting method signatures to “known” types, or merely for documentation.

In practice, “known types” is likely equivalent to “tested types”, which suggests the preference is to avoid restricting functions to the tested surface area.

6 Likes

That makes some sense.

You consider introducing an intermediary abstract array type, say, AbstractSimpleArray which always begins with index 1 and indices are separated by one(T) [or oneunit(T) for Ints they are the same].

What you are constraining is less the Array abstraction than its indices convention / constraint / requirements. There is an AbstractUnitRange{T} which is

Supertype for ranges with a step size of oneunit(T) with elements of type T. UnitRange and other types are subtypes of this.

and a concrete type Base.OneTo:

help? > Base.OneTo
Define an AbstractUnitRange that behaves like 1:n, with the added
distinction that the lower limit is guaranteed (by the type system) to
be 1.

The AbstractSimpleArray is an AbstractArray with indices that conform to Base.OneTo. Of course, one may want a 1 based array that takes 2 or n steps from index to index – however that is no longer simple. And there are subtypes of AbstractArray{T,N} that would need to remain as subtypes(AbstractArray) and also to be sub-specialized with unique names as subtypes(AbstractSimpleArray) or Better it seems, with a range categorization constraining parameter
[though this need not be breaking, great care is needed to insure that].

Here is a direction, without much consideration for what-would-happen.
There is a better refactoring than using Range types as params.

abstract type AbstractArray{T,N} end
abstract type AbstractSteppedArray{T,N,R} <: AbstractArray{T,N} end
abstract type AbstractSimpleArray{T,N} <: AbstractSteppedArray{T,N,Base.OneTo} end

 struct SimpleArray{T,N} <: AbstractSimpleArray{T,N}
         values::Array{T,N}
end

Base.size(x::AbstractSimpleArray) =
  isempty(x.values) ? nothing : size(x.values)
Base.get(x::SimpleArray{T,1}, i::Integer, default::T=zero(T)) where {T<:Number} =
  getindex(x.values, i, default)
Base.get(x::SimpleArray{T,1}, i::Integer, default::T) where {T} =
  getindex(x.values, i, default)
Base.getindex(x::SimpleArray{T,1}, i::Integer) where {T<:Number} = 
   1 <= i <= length(x.values) ? getindex(x.values, i) : nothing

# dispatch catches disallowed range[s] if used to index
# ...

additional info from the docs:

If a type is defined as a subtype of AbstractArray , it inherits a very large set of rich behaviors including iteration and multidimensional indexing built on top of single-element access. See the arrays manual page and the Julia Base section for more supported methods.

A key part in defining an AbstractArray subtype is IndexStyle. Since indexing is such an important part of an array and often occurs in hot loops, it’s important to make both indexing and indexed assignment as efficient as possible. Array data structures are typically defined in one of two ways: either it most efficiently accesses its elements using just one index (linear indexing) or it intrinsically accesses the elements with indices specified for every dimension. These two modalities are identified by Julia as IndexLinear() and IndexCartesian(). Converting a linear index to multiple indexing subscripts is typically very expensive, so this provides a traits-based mechanism to enable efficient generic code for all array types.

This distinction determines which scalar indexing methods the type must define. IndexLinear() arrays are simple: just define getindex(A::ArrayType, i::Int).

1 Like

But if functions are restricted to known/tested types, across the ecosystem, what will happen to composability? Is your core argument that it is better for the language and community to sacrifice composability in general? Should generic programming not be considered good practice, or am I reading too much into this comment?

4 Likes

This doesn’t quite work:

1.7.2> foo(1:3)
"Success"

1.7.2> foo([1,2,3])
ERROR: MethodError: no method matching foo(::Vector{Int64})

But even if the list was complete for arrays in Base, this would mean no support for Symmetric or Diagonal matrices, no SparseArrays, no SVector, etc. etc. It would be much more limiting than you probably want.

5 Likes

What sort of problems do you envision?

There’s no AbstractTuple type, so there’s no code that expects 1-indexed tuples, and could be surprised by 0-based tuple input. Only if a function accepts general iterables, could an OffsetTuple be used unexpectedly, and then the code must anyway deal with OffsetArrays, so you would need general indexing. I don’t think anyone would be bothered by someone registering OffsetTuples.jl.

The issue isn’t which packages exist, but what properties can you rely on for abstract types that you want to support. Tuples do, but AbstractArray and iterables in general do not, guarantee 1-based indexing. It should therefore be as easy as possible to work with them, and mostly it is. for a in A is totally general; eachindex(A) is at least as simple as 1:length(A), certainly easier to type; axis(A, 1) is simpler than 1:size(A, 1), etc. It’s when you don’t want to just loop over all elements that it gets a bit harder, then you have begin, end, firstindex, lastindex, first, last, …

But maybe there could be some very simple and universal way to get a one-based view into any AbstractArray? OffsetArray has no_offset_view, but should all AbstractArrays have a method like that? Then anyone who wants less hassle would just write

function foo(A0::AbstractArray)
    A = OneBasedView(A0)
    for i in 4:length(A)-3
        ...
    end
end

Could this be made even simpler? Could there be some way to facilitate automatic promotion to OneBased? A @one_based macro, a new keyword, for1? (just kidding, maybe.)

2 Likes

Actually, there’s a subtlety here, and this may not do what is intended.

julia> x = ones(-10:-1, -10:-1);

julia> for t in axes(x, 1)[4:end], i in axes(x, 2)[2:end]
           x[t, i] = 2x[t-3, i-1]
       end

julia> all(isone, x) # nothing changed
true

julia> axes(x, 1)[4:end] # we looped over empty ranges
4:3

julia> axes(x, 2)[2:end]
2:1

The axes of an OffsetArray are also offset, and this indexing may not work as expected.

2 Likes

Ouch. Yes, I typed this from my phone, so dropped testing it :frowning:

Certainly there is no AbstractTuple at the moment but, given the flexibility of the language, there is no reason why AbstractTuples and OffsetTuples could not exist in the future. Should the use of these become widespread then this could cause disruption to the existing code base. Much of the discussion so far has concerned AbstractArrays but these sorts of problems could arise in the future with any of the abstract types. I guess this is one of the issues that Yuri was concerned about.

1 Like

No, this would require changes in Base itself (or even Core?), the type hierarchy with Any → Tuple is hardcoded, and you cannot wedge AbstractTuple in there from the outside. It is not something that is possible to do from a package, or somehow from ‘the outside’, so it’s not related to the flexibility of the language.

If you allow for changes to the core language, then of course anything could happen, but that’s obvious.

2 Likes

I have missed AbstractTuple as a way to let concrete Tuples, NTuples and NamedTuples more easily play in the same sandbox[s]. One advantage would be an easily seen uniform convention for sequential indexing of the ith element of a Tuple, or the ith value of a NamedTuple. Bringing NamedTuples closer in devthink to Tuples elaborated with position-attached names (symbols) would make working flexibly with them easier for many imo.

julia> NTuple{2, Int} === Tuple{Int, Int}
true

NTuple is just an alias.

1 Like

As someone who was in the “1-based to rule them all” camp for a while during that discussion, I can totally see the appeal of separating OffsetArrays somehow. But I came to realize some things:

  1. There’s no decree to use or support OffsetArrays, and we can totally write 1:n syntax in that case. We even have the require_one_based_indexing function to check inputs. Funnily enough, if there’s a custom array type that isn’t 1-based, OffsetArrays can make it 1-based to be compatible with your 1-based code.

  2. OffsetArrays are actually important. You should see the cool examples on the Julia blog post introducing them, but it’s not hard to run into math where starting at 1 doesn’t make sense. We’d have to do offset calculations to get to the index we need anyway, why not make a wrapper type do it automatically?

  3. Why shouldn’t OffsetArrays work in AbstractArray code? Many functions really could work with any indices, and some functions currently marked with require_one_based_indexing actually could work with offset indices, just require that all inputs share compatible ones. Incompatible indices can just throw the existing DimensionMismatch error.

  4. We definitely should make a docs section or tutorial that teaches people right away how to write “generic” indices; people might not be as good as indexing arrays as they think. (Even the devs! The OffsetArray discussion revealed some overflow bugs when indices or related values get too close to typemax/typemin.) It’s not a big deal when you’re just indexing from begin to end (and in that case, just call eachindex(A) or axes(A, i)), but it gets real important when you start doing fancy stuff like indexing the “middle”, skipping every other index, or going backwards. Obscure errors happen when different key quantities that happen to have the same values are confused with each other, like an n meaning the last index vs. an n meaning length. In 0-based indexing, this is actually more annoying because there are invisible 0+ everywhere that you have to spot to edit the indexing order. begin/end syntax could be improved to make these distinctions easier, and even if you don’t do offset indices you’d reach for the generic syntax just for clarity.

9 Likes

Yes, Certainly. Additionally, I have found it a clearly readable way of saying
“All the elements present are of type T, and there are N of them”, the first part is obvious enough with Tuple{Int, Int, Int, Int, Int, Int, Int, Int}, the second less so.

But then you don’t need AbstracTuple to ‘let them play in the same sandbox’?

1 Like

Rather than doing this in a function in your normal code or in a standalone test in a testrunner I rahter see this check done by a linter/compiler via the type system.

In general that’s the evolution that you see in programming languages over the decades. In the beginning you have languages with pretty weak type systems (e.g. C) so if you wanted to enforce (complex) constraints you had to do it in runtime code. Later on type system got more sofisticated and a lot of checks could be offloaded to that (e.g. ML, Haskell, Scala, F#,…)

There’s a saying (usually for static vs dynamic languages but applies more widely): every type check is a bunch of tests you don’t have to write (or can forget to write).

about 4:
If you’re just going loop over all elements I find “for e in A … end” the better way than eachIndex and then indexing. It’s just nice syntactic sugar that abstracts all that stuff and on the plus side also column based vs row based matrices or even if it’s a vector, matrix or whatever iterable collection.

1 Like

In other words, the iteration interface. I do see people unnecessarily iterating over indices just to getindex one array only once, good to point out. Iteration isn’t good enough when you need setindex!, though.

“With the introduction of OffsetArrays.jl AbstractArrays now confounds abstraction of element data type with the abstraction of indexing”

Actually the abstraction wasn’t about element type (that’s covered by generic type parameters) but was more about underlying storage type like BLAS matrix, distributed array, off-heap (on-disk) array,…

I agree with the confounding now with indexing and trouble of (retro)fitting the type hierarchy. If you wanted to shove storage type & indexing scheme into a type hierarchy you’ll have a multiplier effect and hence a lot of leaf nodes. A better way is to go more general and go with a type graph: i.e. have a type implement multiple traits/interfaces (yes here are traits again!). So you would have a trait hierarchy for storage type and one for indexing and a type would pick one from each.

2 Likes

Yes, and also about structure or properties, like symmetry and diagonal, or normalization, etc.