Ordinal Indexing as a Language Feature

I don’t think I’ll join the debate about whether dictionaries should be unordered and have ordered counterparts, as in Base, or whether dictionaries should be ordered and have unordered counterparts, as in Dictionaries.jl.

FWIW, Python used to do the former, but has been moving to the latter. I guess it’s the hot new thing.

It’s worth noting though, that even if you adopt this model it doesn’t automatically make vector math performant. Dictionaries are meant for random insertion and deletion, while vectors only allow pushing and popping from the ends. (and multidimensional arrays’ dimensions are immutable.)

Unset values are not necessarily removed from the Dictionary's internal vector, so data are not guaranteed to have contiguous internal representation.
julia> x = Dictionary([1:5...], [1:5...])
5-element Dictionary{Int64, Int64}
 1 │ 1
 2 │ 2
 3 │ 3
 4 │ 4
 5 │ 5

julia> unset!(x, 3)
4-element Dictionary{Int64, Int64}
 1 │ 1
 2 │ 2
 4 │ 4
 5 │ 5

julia> dump(x)
Dictionary{Int64, Int64}
  indices: Indices{Int64}
    slots: Array{Int64}((16,)) [5, 4, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, -3, 1, 0]
    hashes: Array{UInt64}((5,)) UInt64[0x5bca7c69b794f8ce, 0x3795033f6f2a0674, 0x8000000000000000, 0x6f2a25235e544a31, 0x4af5148d95ea2900]
    values: Array{Int64}((5,)) [1, 2, 3, 4, 5]
    holes: Int64 1
  values: Array{Int64}((5,)) [1, 2, 3, 4, 5]

The differences between vectors and ordered dictionaries are still strong enough to justify separate types imo.


I think the real point of interest in this discussion, though, is in thinking about ways to communicate ordinal indexing for arrays (and ordered dictionaries). We currently have begin and end for this purpose, which work inside square brackets e.g. myvec[begin+1:end-1], and we can use eachindex.

We start jumping through hoops though, because begin and end only have this meaning when contained inside the square brackets, and eachindex forces us to think about iterator methods like Iterators.drop instead of just letting us calculate index offsets like we want to. We can claw back some of what we want with firstindex(myvec)+1:lastindex(myvec)-1, but now things really start getting unwieldy and the wretched 0-indexers take their victory laps.

In some specific use-cases, we use macros: for example, we have the @view macro specifically so that we can still use the begin and end keywords, e.g. @view A[:, begin+1:end-1]. Macros don’t compose well though. (I’ve previously pondered using an indexable type to achieve this goal without having to use a macro, e.g. View(A)[:, begin+1:end-1].)

But as @jishnub explains in the readme for OrdinalIndexing.jl, even this isn’t perfectly satisfactory; we still frequently wish to specify and manipulate ordinal indices outside the brackets, for example when looping over the indices with a generator, e.g.:

g(a, n) = sum(a[i, j] for i in (1:n)th, j in (1:n)th)

This approach is really clever, because it’s easy to do arithmetic with ordinal indexes again.

Edit: Crossed out stupidity. (See next post.)

As we’ve already found during this discussion though, while this is a nice replacement for begin, it isn’t so good for end (and the English language doesn’t offer such nice suffixes to specify index offsets from the end as from the start).

Here’s a short experiment, to use arithmetic with first and last to specify ordinal offsets:

more stupidity
julia> struct OrdinalFirstOffset i::Int end

julia> struct OrdinalLastOffset  i::Int end

julia> Base.:+(::typeof(first), i::Int...) = OrdinalFirstOffset(sum(i))

julia> Base.:-(::typeof(last),  i::Int)    = OrdinalLastOffset(-i)

julia> struct OrdinalUnitRange{I1,I2} i1::I1; i2::I2 end

julia> (::Base.Colon)(i1::I1,i2::I2) where {I1<:OrdinalFirstOffset, I2<:OrdinalLastOffset} = OrdinalUnitRange(i1, i2)

julia> first+1:last-1
OrdinalUnitRange{OrdinalFirstOffset, OrdinalLastOffset}(OrdinalFirstOffset(1), OrdinalLastOffset(-1))

There would have to be a fair bit of arithmetic operator overloading to get these to work (namely, any operator you expect to work with integers would have to be overloaded for OrdinalFirstOffset and OrdinalLastOffset types), and you’d want a few more methods of (::Base.Colon) overloaded, but that’s the gist.

If the specification for Base.getindex of AbstractArrays (and abstract ordered dictionaries) required accepting objects like these, then we could efficiently calculate ordinal indices outside of [] brackets from either start or end and get fresh ammunition to defeat the angry hoards of 0-indexers.

1 Like