OffsetArrays, inbounds, and confusion

In addition to the nice examples in the announcement blogpost I wanted to point out a place where I’ve really appreciated offset arrays: In the JuliaActuary package MortalityTables.jl.

There are data tables which are indexed by a person’s age in practice (example) and it doesn’t always start at 0 or 1. Many times you aren’t indexing the array from start to finish either, so I often run don’t want eachindex.

So shout out to Julia to making this more natural/performant than my attempts in other languages!

2 Likes

It strikes me that I’ve always thought of type annotations in functions as something preventing headaches for the potential user. That is, if I define a function and say that it takes an AbstractArray, I’m thinking “this input should be an AbstractArray”, not "I’m guaranteeing that it will work for all AbstractArray subtypes that anyone will ever invent. That is, I’m relying on functionality that the type provides, but I’m not taking into account all other functionalities that would distinguish those other subtypes from the ones I’m usually working with, because I can’t.

But maybe that’s the wrong mindset. After all, if I’m annotating a function thinking that this is to prevent users from expecting it to call it on things which are not even AbstractArrays, it is only throwing a different error than it would if I hadn’t used any annotation at all. If I’m indeed relying on things that are only possible for AbstractArrays, it will throw an error somewhere deeper in the function. Conversely, if I had annotated to say that I’m expecting this to work for any AbstractArray subtype, no annotation would mean that I would expect this to work for all types, which almost no function will in practice (except trivial ones like typeof).

So I wonder whether it is really good to expect a type annotation to be a contract that any subtype should be supported, when we have no control over its behavior. For example, someone could define a DontTouchMeArray that sets all its elements to zero as soon as you try to modify any of its elements. It is almost guaranteed to break any function with an !, but I don’t believe it violates any of the things written down in Interfaces · The Julia Language, or at least it’s not obvious to me if it does.

I think you’re right on the money that worrying about your code working with any other specific library is too much. Ideally, your generic code uses a well-documented interface that other people try to extend, and uncomposability is indicated by MethodErrors from incompatible interface methods e.g. setindex! for immutable arrays. But nothing’s ideal, the generic code may make unintended and undocumented assumptions; types can throw an inscrutable error at some downstream method that’s not even in the package; or anything else that’ll end up on a Github issue or discourse post.

You’re right about better documentation; if anything goes wrong, I want to quickly check if I violated any assumptions and limitations in a readily accessible place, like a docstring or a FAQ. But that may not be the only thing you can do.

This is actually one of the issues cited in the “Why I no longer recommend Julia” blogpost that started this recent discussion of OffsetArrays: some Base methods didn’t check for memory aliasing or document a warning against it. To avoid this, you could make input-validating methods and call them first in your methods, or if the checks have too much runtime cost and don’t need to be ran repeatedly, you can document the usage of such methods in very visible code examples.

In this case I would avoid making generic methods for broad types like AbstractArray. In their stead, I might make an alias to concisely write a Union of all the types I intend to support and test. Sometimes that might be too restrictive, and maybe clear documentation of limitations is the best you can do. A user might gripe that your code doesn’t compose with a type they want, but you don’t owe them that if you have made clear your code’s intentions. They can use multiple dispatch to implement what they need in their own package, as you’ve said, or improve your code with your cooperation. OffsetArrays is a bit of a counterexample because the AbstractArray interface does nominally support non-1-based indexing, and it’s such a thin wrapper over other array types that it’s very plausible to support.

3 Likes

Im confused. thought the idiom was
for (i,x) in enumerate(X)
versus
for (i,x) in enumerate(eachindex(X))
?

A = rand(5)
julia> for (i,a) in enumerate(A)
       println(i," ",a)
       end
1 0.16336397100368438
2 0.4301384028138592
3 0.491507690411888
4 0.6551251460857042
5 0.36388032124662595

versus

julia> for (i,a) in enumerate(eachindex(A))
       println(i," ",a)
       end
1 1
2 2
3 3
4 4
5 5

We were talking about how to reference a counter that starts at one if the indexing base for the array is something else. So the output could be something like

julia> for (counter, index) in enumerate(eachindex(A))
       println(counter, " ", index)
       end
1 0
2 1
3 2
4 3
5 4
1 Like

No one has mentioned the OffsetArrays provides a view without offsets

OffsetArrays.no_offset_view()

Consider the following:

Setup Arrays:

c = ones(5,5).*2

using OffsetArrays
A = OffsetArray(zeros(5, 5), OffsetArrays.Origin(-2, -2))

for (i, j) in zip(axes(A)...)

    println(i,"\t",j)

    A[i, j] = (i-j)^2+abs(i)+abs(j)+1

end

julia> c
5×5 Matrix{Float64}:
 2.0  2.0  2.0  2.0  2.0
 2.0  2.0  2.0  2.0  2.0
 2.0  2.0  2.0  2.0  2.0
 2.0  2.0  2.0  2.0  2.0
 2.0  2.0  2.0  2.0  2.0

julia> A
5×5 OffsetArray(::Matrix{Float64}, -2:2, -2:2) with eltype Float64 with indices -2:2×-2:2:
 5.0  0.0  0.0  0.0  22.0
 0.0  3.0  0.0  0.0   0.0
 0.0  0.0  1.0  0.0   0.0
 0.0  0.0  0.0  3.0   0.0
 0.0  0.0  0.0  0.0   5.0

Now use them:

julia> tr(c)
10.0

julia> tr(A)
ERROR: ArgumentError: offset arrays are not supported but got an array with index other than 1
Stacktrace:
 [1] require_one_based_indexing

julia> tr(OffsetArrays.no_offset_view(A))
17.0

2 Likes

oh, so
A[index]
is “guaranteed” to exist?
is there a fancy way to do it just with a function enumerate() or similar (i.e. to get the counter, index, and value of A in one function?)

for (counter, index, a) in Xenumerate(A)

and then if that was the preferred idiom for people to use, offset arrays would “just work” more often then now?

thanks for clarifying

pairs(collection)

Return an iterator over key => value pairs for any collection that maps a set of keys to a set of values. This includes arrays, where the keys are the array indices.

Also see this old answer of mine.

To my knowledge there is no function that combines pairs and enumerate, but you can just pass the return of pairs to enumerate.

julia> using OffsetArrays
[ Info: Precompiling OffsetArrays [6fe1bfb0-de20-5000-8ca7-80f57d26f881]

julia> oa = OffsetArray(["a", "b", "c", "d", "e"], -2:2)
5-element OffsetArray(::Array{String,1}, -2:2) with eltype String with indices -2:2:
 "a"
 "b"
 "c"
 "d"
 "e"

julia> for (count, (index, value)) in enumerate(pairs(oa))
           println("$count $index $value")
       end
1 -2 a
2 -1 b
3 0 c
4 1 d
5 2 e
2 Likes

thanks. continuing my edification, how come this works to replace traditional for i=1:length(A) loop
for a in A
but not
for (i,a) in A
or even
for (count,i,a) in A

It seams to me that Julia should have enough info to make this work?

(please be nice, I come from a numerical/Matlab/Fortran background, so all these post 1970s data structures are still slightly opaque).

1 Like

You are correct it could act the way you propose, it is a conscious design decision to not go that way. for a in A will always iterates just the values of A. You could design/change the language to do the equivalent to for (i, a) in pairs(A) when you write for (i, a) in A, and do the equivalent of for (count, (i, a)) when you write for (count, i, a) in enumerate(pairs(A)). However, this would disallow you to de-structure tuples in the general case, because the syntax would be ambiguous. For example, the following use case would be disallowed (or have its meaning changed):

julia> A = [('a', 10), ('b', 20), ('c', 30)]
3-element Array{Tuple{Char,Int64},1}:
 ('a', 10)
 ('b', 20)
 ('c', 30)

julia> for (a, b) in A; println("$a $b"); end
a 10
b 20
c 30

I do agree with the language design decision here, as I find it to be the more elegant one. Now, any of these iterators that you may want (including the count, i, a) can be implemented by enumerators instead.

julia> enumpairs(xs) = zip(1:length(xs), eachindex(xs), values(xs))
enumpairs (generic function with 1 method)

julia> using OffsetArrays

julia> oa = OffsetArray(["a", "b", "c", "d", "e"], -2:2)
5-element OffsetArray(::Array{String,1}, -2:2) with eltype String with indices -2:2:
 "a"
 "b"
 "c"
 "d"
 "e"

julia> for (count, i, a) in enumpairs(oa)
           println("$count $i $a")
       end
1 -2 a
2 -1 b
3 0 c
4 1 d
5 2 e
6 Likes

UPDATED. ORIGINAL SOLUTION WAS WRONG (won’t delete it for context in the discussion that follows), but I updated this post so that the correct answer is here also (solution proposed by @mbauman).

This was the first thing I thought about when reading this thread. I think I found a decent solution:

for i in 5:length(a)-4
  #...
end

becomes (wrong answer)

for i in eachindex(a)[5:end-4]
    # ....
end

Correct solution

for i in eachindex(a)[begin+4:end-4]
    # ....
end

Paulo

That doesn’t work because the indices of an offset array are themselves offset. So indexing eachindex(A)[5:end-4] doesn’t necessarily start from 1 like you’d hope. Instead, you could use eachindex(A)[begin+4:end-4], which has the additional virtue of being symmetrical.

2 Likes

For generic code I’d refrain from using [begin+4:end-4],
as it would fail on “strided” arrays (e.g. arrays stored in compact form, but with only even indices).

Edit: leaving the part below, which shows how indexing on eachindex surprised me.

I read Paulo’s post as “iterate from the 5th element to the 4th before last”,
which seems a valid use case.

To do that, one can’t index on the eachindex(oa) iterator itself

oa = OffsetArray(["a", "b", "c", "d", "e"], -2:2)
eachindex(oa)[3:5]
BoundsError: attempt to access 5-element OffsetArrays.IdOffsetRange{Int64, Base.OneTo{Int64}} with indices -2:2 at index [3:5]

I currently do not understand what it is trying to achieve.

But on the collected indexes, it works nicely:

oa = OffsetArray(["a", "b", "c", "d", "e"], -2:2)

# loop on the 3rd to 5th element
for idx in collect(eachindex(oa))[3:5]
	println("$idx: $(oa[idx])")
end

0: c
1: d
2: e

I’d expect Base.iterators.take(iterator, 3:5) to return a nice iterator for this purpose,
but it’s not the case yet (as of julia-1.6.5).

Julia is already awesome, and progressing quickly in the right direction. Keep on the good work !

This is false. All AbstractArray subtypes must have AbstractUnitRange axes — even those with nontraditional indexing. Interfaces · The Julia Language

Your memory can be strided, but the indices in Julia must be contiguous.

8 Likes

All AbstractArray subtypes must have AbstractUnitRange axes

Thanks. I see now that my hypothetic “strided” array could not be an AbstractArray.
This simplifies things.

1 Like

Other than OffsetArrays, there are many containers that support only a part of array interface, like Queue, Stacks and so on.

So, If I have a function that only pushes an element to its argument, I just wat to be sure to dispatch any argument type that is push-able. But there is no such trait-based dispatch.

On the other side, looks like AbstractArray is a poor abstraction in terms of its “compactness” / “surface area” (brings too many assumptions and methods)? In other words, a single abstract type bounds too many traits together.

So, maybe an ideal solution (for Julia 2.0, who knows?) would be to have any abstract type declared as a set of small traits? Then, type dispatch over types can be expanded to dispatch over custom sets of traits, so something like this would be a valid code for any subtype, that supports all declared traits:

# trait set for A is a subset of AbstractArray, but not OffsetArray:
function my_sum(A::{Indexable, IndexFromOne, HasLength, FiniteLength, Eltype{T}, etc.}) where T
    r = zero(eltype(A))
    for i in 1:length(A)
        @inbounds r +=A[i]
    end
    return r
end
1 Like

Thanks! I didn’t know about zip().

Putting my numerics hat on and to clarify… If i was to use enumpairs() in “matlab” style Julia (i.e vectors and matrix of Float64’), Julia is smart enough to make this as fast as if I wrote it in the “Fortran” style

count  = 1
for i=1:N
  a = A[i]
  ...
  count = count + 1
end

i.e. there is no numerical performance hit by using “fancy” functions like zip() and enumerate() rather than “raw” for loops.

If the version with enumerate/pairs are slower, it’s a bug.

3 Likes

I tried it out quickly and it appeared to work… Must have typed something wrong. Sorry

Whether a function is exported or not does not by itself indicate whether it is public API or not. The main indicator if it is public API is whether it had a docstring and is included in the manual. If it is documented and the documentation does not say otherwise, it is public API.

There have been some efforts to explicitly declare components to be public API. One example is this issue:
https://github.com/JuliaLang/julia/issues/42117

And this package:
https://docs.juliahub.com/PublicAPI/Jfs06/0.1.0/

1 Like