[ANN] AxisKeys.jl

AxisKeys.jl is another take on roughly the problem addressed by AxisArrays.jl (and xarray), of storing a list of “keys” alongside each axis of a multi-dimensional array. This one tries to be simpler in that:

  • There are no special types to know about: it just stores any AbstractVector of keys, one per dimension.

  • Handling of dimension names is outsourced to NamedDims.jl; like the names of a NamedTuple, these are just Symbols. They allow keyword indexing A[row=3] etc. The two wrappers commute (with tests run on both orders), and each can be used alone.

  • Looking up elements based on these keys is kept clearly distinct from indexing, partly to ensure that keys which happen to be integers (such as iteration counts) aren’t second-class citizens. It can be written with round brackets, A("three") etc, or by using certain “selectors” (described in the readme).

Lookup happens by i = findfirst(isequal(key), vec) and friends. If it needs to be fast, then you can use for instance UniqueVectors.jl or AcceleratedArrays.jl for the keys, which already overload exactly this function. But lookup is secondary to indexing, this is still an array. If you want this to be the primary way in which you address individual elements, then you probably want something like Dictionaries.jl instead.

Other features:

  • Many operations propagate keys (and names), including broadcasting (for which AxisArrays.jl never got updated for Julia 1.0), map, most comprehensions, concatenation, sorting, mapslices, and some linear algebra.

  • One-dimensional KeyedVectors allow push!, which is smart enough to extend a StepRange of keys, e.g. push!(KeyedArray(rand(3), alpha='a':'c'), 0.444), or to push into both data & keys.

  • It’s not as close to zero-cost as NamedDims.jl alone, as there are more pieces to move around, but it tries! Almost everything is type-stable.

  • Plays well with OffsetArrays.jl, which really does change indexing (but only allowing consecutive integers).

  • Has colourful pretty-printing showing keys next to data. And a detailed readme file.

This was renamed from AxisRanges.jl, BTW, in case old threads are confusing.

7 Likes

Do I understand correctly: Lookup with get index doesn’t have additional cost in you approach, only if you say have broadcasted operations and code runs to figure out the new keys of the resulting object?

Yes A[1, 2] simply calls getindex(parent(A),1,2), and should have no extra cost. Something like A[1, 3:end] has some overhead in that it also does axiskeys(A,2)[3:end]. Broadcasting exp.(A) will make a copy of the keys, and A .+ B will need to compare them to check that they match.

What I’m calling lookup is things like A(: 'c') == @view A[:, Key('c')], and this has extra overhead in that it must first call i = findfirst(isequal('c'), kvec) to find the index. How much will depend on the type and length of the vector. There is a shortcut for kvec::StepRange which just directly calculates i, but still costs a few ns. For kvec::UniqueVector this is instead done using a Dict's hashing. But this is always a two-step process, the key is used to find i which is then used to index into parent(A), unlike Dictionaies.jl which (if I understand right) directly stores the contents in a hash-accessible (but index-inaccessible) way.

1 Like