Custom indexing for a named array


#1

I am playing around with a custom named array and trying to figure out how to make it play nicely with custom indices, such as those in @mbauman’s fantastic InvertedIndices package.

Here’s a toy example, simple as I could make it:

# A struct representing a named index
struct Name{T}
    name::T
end

# A toy "array with names" wrapping a data array
struct Arr{T, N} <: AbstractArray{T, N}
    data::Array{T, N}
end

# Standard array functions and indexing
Base.size(A::Arr) = size(A.data)
Base.axes(A::Arr) = axes(A.data)
Base.getindex(A::Arr, i::Int) = A.data[i]
Base.getindex(A::Arr{T, N}, I::Vararg{Int, N}) where {T, N} = A.data[I...]

# Toy named index lookup; resolve names to the name they wrap
resolve_index(A, i) = i
resolve_index(A, i::Name) = i.name

That defines the basic array. The core problem comes up here:

function Base.to_indices(A::Arr, inds, I::Tuple{Any, Vararg{Any}}) 
    I′ = Tuple(resolve_index(A, i) for i in I)
    to_indices(A.data, inds, I′)
end

# Indexing with names and more complicated indices, e.g. :
function Base.getindex(A::Arr{T, N}, I::Vararg{Any}) where {T, N}
    # Resolve any names to indices into the underlying array
    I′ = to_indices(A, I)
    A.data[I′...]
end

Note: The {Any, Vararg{Any}} part of the Tuple type is needed to disambiguate from this to_indices method in Base.

Here’s where the trouble comes in:

# You may need to `]add https://github.com/mbauman/InvertedIndices.jl`
using InvertedIndices

a[Not(2)] # => error

That fails with a method ambiguity error:

MethodError: to_indices(::Arr{Int64,2}, ::Tuple{Base.OneTo{Int64}}, ::Tuple{InvertedIndex{Int64}}) is ambiguous. Candidates:

  • to_indices(A, inds, I::Tuple{InvertedIndex,Vararg{Any,N} where N}) in InvertedIndices at …/packages/InvertedIndices/Ulzlz/src/InvertedIndices.jl:68
  • to_indices(A::Arr, inds, I::Tuple{Any,Vararg{Any,N} where N}) in Main at In[7]:25 (note: this is my version above)
    Possible fix, define
    to_indices(::Arr, ::Any, ::Tuple{InvertedIndex,Vararg{Any,N} where N})

The big goal is to be able to use both names and Nots together without my package needing to be aware of InvertedIndices:

a[Not(Name(2)), :]

I asked Matt B. about this earlier today and this iteration is based on his feedback. My understanding is that the dispatch chain has to go from my getindex method → to_indices in InvertedIndices → to_indices implemented above for Arrto_indices from Base on the underlying data array, but I haven’t quite been able to set it up.

What can one do to have custom arrays and custom indices happily coexist?

P. S. Here’s a gist with all of the code in one piece.


#2

Yeah, it’s hard to design interfaces that simultaneously allow for dispatch on either array or index without ambiguity. The to_indices method table is geared towards dispatch on the leading index type. As such, I’d define:

Base.to_indices(A, inds, I::Tuple{Name, Vararg{Any}}) =
    to_indices(A, inds, (name_to_index(A, I[1]), Base.tail(I)...))

name_to_index(A, i::Name) = error("named indices are not supported for $(typeof(A))")
name_to_index(A::Arr, i::Name) = resolve_index(...)

#3

From an earlier discussion, I was under the impression that only integer custom indexes are supported by the interface, did that change?


#4

That comment was about what an array returns from axes — which is still true. But you can use any custom type as an index so long as your getindex method supports it. You must also support integer indices, though, so typically this means that you can lean on to_indices to convert your esoteric non-integer to its corresponding integer location.

You can also use the simpler Base.to_index if you don’t need a reference to which dimension you’re indexing or the possibility to return more than one dimensions’ worth of indices. E.g.,

struct NamedVector{T,A,B} <: AbstractArray{T,1}
    data::A
    names::B
end
function NamedVector(data, names)
    @assert size(data) == size(names)
    NamedVector{eltype(data), typeof(data), typeof(names)}(data, names)
end
Base.size(n::NamedVector) = size(n.data)
Base.getindex(n::NamedVector, i::Int) = n.data[i]
Base.to_index(n::NamedVector, name::Symbol) = findfirst(==(name), n.names)

julia> n = NamedVector(1:4, [:a, :b, :c, :d])
4-element NamedVector{Int64,UnitRange{Int64},Array{Symbol,1}}:
 1
 2
 3
 4

julia> n[:a]
1

julia> using InvertedIndices

julia> n[Not(:a)]
3-element Array{Int64,1}:
 2
 3
 4

If you try to index with a vector of names, you’ll see that we checkbounds before doing the conversion to integer — so we just need to add the appropriate checkbounds method and then this all works:

julia> Base.checkbounds(::Type{Bool}, n::NamedVector, names::AbstractArray{Symbol}) = all(name in n.names for name in names)

julia> n[[:b,:c]]
2-element Array{Int64,1}:
 2
 3

julia> @view n[[:b,:c]]
2-element view(::NamedVector{Int64,UnitRange{Int64},Array{Symbol,1}}, Symbol[:b, :c]) with eltype Int64:
 2
 3

#5

Thanks, this helped a lot. Is this documented? I could not find it in the custom indexes devdocs.

Also, suppose (for a toy problem) that I have a 2-dimensional matrix, and I want to allow indexing with names in the second axis. How would I write to_indices so that it resolves n[1:2, [:a, :b]] and similar?


#6

Very enlightening, thanks!

My use case is also more complicated — like AxisArrays, I want to allow different names for different dimensions. And I’d like to allow pretty much anything (other than possibly Colon and CartesianIndex) as an index — is there a way to write the specialized non-ambiguous signature for an arbitrary name type?

My thinking was that I’d use an explicit Name wrapper to allow plain integer indices as names, but index with the values (e.g. Symbols or Strings) directly in other cases, though the wrapper would also still work if used.


#7

See the to_indices documentation for some of this. That custom indexing devdocs isn’t really about this case — it’s about offset arrays.

Yes, I intentionally punted on the harder task of different names in different dimensions. You can figure out which dimension you’re on based upon the number of axes left in the second argument of the three-argument to_indices method.


#8

Ah, clever! I think that’s the missing piece.


#9

The key thing to remember about extending most any method (particularly complicated ones) is that you want to look at what already exists to figure out how you’ll plug in. Here are the three-argument methods that exist in Base for to_indices:

# 7 methods for generic function "to_indices":
[1] to_indices(A, inds, ::Tuple{}) in Base at indices.jl:296
[2] to_indices(A, inds, I::Tuple{CartesianIndex,Vararg{Any,N} where N}) in Base at multidimensional.jl:607
[3] to_indices(A, inds, I::Tuple{Union{BitArray{N}, Array{Bool,N}}}) where N in Base at multidimensional.jl:620
[4] to_indices(A, inds, I::Tuple{AbstractArray{CartesianIndex{N},N1} where N1,Vararg{Any,N} where N}) where N in Base at multidimensional.jl:611
[5] to_indices(A, inds, I::Tuple{AbstractArray{Bool,N},Vararg{Any,N} where N}) where N in Base at multidimensional.jl:616
[6] to_indices(A, inds, I::Tuple{Colon,Vararg{Any,N} where N}) in Base at multidimensional.jl:626
[7] to_indices(A, inds, I::Tuple{Any,Vararg{Any,N} where N}) in Base at indices.jl:297

So you can see that the only argument that’s getting specialized is the third one — and then it’s really just the first element of that tuple. So if you want to specialize a different argument (like the first argument A), then you need to make sure that the third argument is also more specific. Defining:

Base.to_indices(A::Arr, inds, I::Tuple{Any, Vararg{Any}})

will be ambiguous — A is more specific than any of the existing methods, but I is less specific. If, instead, you defined:

Base.to_indices(A::Arr, inds, I::Tuple{Name, Vararg{Any}})

then I is now also more specific than any of the existing methods. This won’t incur ambiguities, and the other index types that Base defines (and others add) will gracefully live alongside your method.


#11

It works!

using InvertedIndices

struct Name{T}
    name::T
end

struct Arr{T, N} <: AbstractArray{T, N}
    data::Array{T, N}
    name_to_index::Tuple
    names::Tuple
    function Arr(data::AbstractArray{T, N}, names) where {T, N}
        name_to_index = Tuple(Dict(ks .=> vs) for (ks, vs) in (zip(names, axes(data))))
        new{T, N}(data, name_to_index, names)
    end
end

Base.size(A::Arr) = size(A.data)
Base.axes(A::Arr) = axes(A.data)
Base.getindex(A::Arr, i::Int) = A.data[i]
Base.getindex(A::Arr{T, N}, I::Vararg{Int, N}) where {T, N} = A.data[I...]

function named_to_indices(A::Arr{T, N}, ax, I) where {T, N}
    dim = N - length(ax) + 1
    if isempty(ax) || size(A, dim) != length(ax[1])
        # Disallow linear indexing with a name
        throw(DimensionMismatch("Named indexing expects one name per dimension."))
    end
    to_indices(A, ax, (name_to_index(A, dim, I[1]), Base.tail(I)...))
end

default_name_types = Union{Name, Symbol, String}

Base.to_indices(A::Arr{T, N}, ax, I::Tuple{default_name_types, Vararg{Any}}) where {T, N} =
    named_to_indices(A, ax, I)

name_to_index(A, dim, i::Name) = A.name_to_index[dim][i.name]
name_to_index(A, dim, i) = A.name_to_index[dim][i]

Base.getindex(A::Arr, I...) = A.data[to_indices(A, I)...]

a = Arr([1 2 3; 4 5 6], ([:a, :b], [:c, :d, :e]))
a[:a, Not(:d)]

This implements named indexing with a finite but user-extensible set of axis types. To add a new type, e.g. allow indexing by Float64, one adds a method to Base.to_index:

Base.to_indices(A::Arr{T, N}, ax, I::Tuple{Float64, Vararg{Any}}) where {T, N} =
    named_to_indices(A, ax, I)

so that specificity issues are handled on a case-by-case basis.

Beautiful stuff. :slight_smile: I think this is now fully general enough for my purposes, and rather simpler than my previous implementation.

There are still a few subtleties left to work out regarding preserving axis names and how this interacts with trailing singleton dimensions (I want to forbid the introduction of new dimensions that way for my array type), and a few other things, but this is excellent progress.

Thanks for all of the help @mbauman!