Using end variable in ranges passed to functions other than getindex?

I expected the following to work:

julia> a = [1,2,3,4,5]
5-element Array{Int64,1}:
 1
 2
 3
 4
 5

julia> deleteat!(a, end-1:end)
ERROR: syntax: unexpected "end"

julia> a[end-1:end]
2-element Array{Int64,1}:
 4
 5

I expected it to delete the last two variables in the array, instead it errors out. Any thoughts on why this behavior happens? Is this a bug?

The parser treats end inside square brackets different from end outside square brackets (where it means “end of block” as in for i = 1:10 ... end).

2 Likes

Not quite as succinct, but you could use endof:

deleteat!(a, endof(a)-1:endof(a))
1 Like

In particular the parser (or rather the “lowering” phase that happens right after parsing) converts any end inside a[...] into a call to endof(a).

You can see this explicitly by calling @code_lowered:

julia> f(a) = a[end-1:end]

f (generic function with 1 method)

julia> @code_lowered f([1,2,3,4])

CodeInfo(:(begin 

        nothing

        return (Main.getindex)(a, (Main.colon)((Base.endof)(a) - 1, (Base.endof)(a)))

    end))
1 Like

Thanks everyone for the explanations. That makes sense. But is this the best behavior? Could the parser not just recognize ends that are parts of ranges and convert those to Base.endof?

I can’t really answer that question, but I think it’s more complicated.

To expand on Steven’s reply, the lowering of end is context-sensitive. For example, within a multi-dimensional array index, end is lowered to size(a, dim) where dim is determined by the position of end within the index:

Julia-0.6.0> f(a) = a[1, end-1:end, 3]
f (generic function with 1 method)

Julia-0.6.0> @code_lowered f(Array{Int}(3,4,5))
CodeInfo(:(begin
        nothing
        return (Main.getindex)(a, 1, (Main.colon)((Base.size)(a, 2) - 1, (Base.size)(a, 2)), 3)
    end))
1 Like

I took this too far :smile:: https://github.com/JobJob/EndyIndexes.jl

pkg> add https://github.com/JobJob/EndyIndexes.jl.git
using EndyIndexes

const start_ = StartBasedIdx()
const end_ = EndBasedIdx()

arr = [1:10;]
for idx in (end_-2:end_, start_+1:end_-2)
    @show arr[idx]
end
a = [1,2,3,4,5]
deleteat!(a, end_-1:end_)

Output

arr[idx] = [8, 9, 10]
arr[idx] = [2, 3, 4, 5, 6, 7, 8]
3-element Array{Int64,1}:
 1
 2
 3

Thanks to the fantastic work that went into custom AbstractArray indexing this was surprisingly easy to implement.


Also just re the comment above, in Julia >=0.7, in multi-dimensional index expressions end now lowers to lastindex(a, dim), in single index expressions just to lastindex(a)

jjulia> f1(a) = a[2, end-1:3, 1]
f1 (generic function with 1 method)

julia> f2(a) = a[end]
f2 (generic function with 1 method)

julia> @code_lowered f1(rand(3,3,3))
CodeInfo(
1 1 ─ %1 = (Base.lastindex)(a, 2)                                                                                                                                │
  │   %2 = %1 - 1                                                                                                                                                │
  │   %3 = %2:3                                                                                                                                                  │
  │   %4 = (Base.getindex)(a, 2, %3, 1)                                                                                                                          │
  └──      return %4                                                                                                                                             │
)

julia> @code_lowered f2(rand(3,3,3))
CodeInfo(
1 1 ─ %1 = (Base.lastindex)(a)                                                                                                                                   │
  │   %2 = (Base.getindex)(a, %1)                                                                                                                                │
  └──      return %2                                                                                                                                             │
)

Maybe this is a good point to make advertisement for the idea for 2.0 to use $begin and $end as they are nice and free, unsurprising, and the dollar works as metaphor (the parser is supposed to substitute them by something like endof(a) and $ indicates talking to the parser. )

3 Likes

Seems like EndpointRanges.jl also does this?

1 Like

As this thread has been resurrected, it seems that endof() was deprecated. We can use lastindex() instead.

I don’t know why we don’t use first and last for this…

Could also be useful for Unicode string from-end indexing. Example: my_string[first+2:last-2] could efficiently form a substring omitting the first two and last two characters of a Unicode string.

You do know about begin and end?

julia> str = "αβγ"
"αβγ"

julia> str[begin+2:end-2]
"β"

The usual caveats about indexing unicode apply.

Yes, but these only work for calculating indices within []; many times you might wish to calculate indices outside, such as in the OP. Hence, the motivation for EndyIndexes.jl, this suggestion for $begin and $end, and EndpointRanges.jl. (Not to mention, the motivation for the @view macro, @jishnub’s OrdinalIndexing.jl, and this thread.) The suggestion to do arithmetic with first and last is just to use existing singleton objects, instead of introducing new start_ and end_ or ibegin and iend objects.

Additionally, because first+x and last-y could return special from-start and from-end indexing objects, they could be used for efficiently indexing strings by character (instead of by byte). For example, I’d imagine "αβγ"[first+1:last-1] would return "β" (with getindex making appropriate calls to nextind and prevind under the hood).

2 Likes

But there’s little advantage to re-using first and last this way, as I think that just about the only thing it saves is

struct IBegin end
const ibegin = IBegin() end

and everything else is just writing methods that dispatch on whatever object you’ve decided to use.

Note that outside of [], coverage is likely to be hit & miss; we’re very unlikely to support tan(ibegin+2) anytime soon. The good news is that relatively few operations seem to go a long way.

1 Like

It’s subjective, but I’m not a huge fan of the ergonomics of ibegin and iend. The words “first” and “last” also already are used several times for this sort of meaning (e.g., first, firstindex, findfirst, etc), so using them here isn’t especially surprising.

That said, indexing from first is effectively zero-based indexing, so even if we had such a feature I’d still want either a) a convenient way to construct 1-indexed views e.g. nth(A)[1:n], or b) 1-based indices e.g. A[(1:n)nth].

We considered using first and last that way (I’m sure there’s a PR or issue discussion one could find if one wanted), but it was decided to be too weird.

That’s too bad; special parsing rules to pun begin and end for indexing seems equally weird, but is less expressive :sweat_smile: I suppose that ship has sailed though.

1 Like

I’m not expecting tan(begin+2), but there’s nothing wrong with the original request to support deleteat!(a, end). I think ordinal indices are the way to go on that, for most cases (e.g. deleteat!(1st)). But we might need something for indexing from the end.

(We could try using Python’s -1st for ordinal indices? I know that conflicts a bit with R, but I think !1st == Not(1st) is the better syntax for that. Alternatively, I do think last is a reasonable choice.)

1 Like

And just today, another thread asking about this. It’s funny how interest in topics tends to bunch temporally :sweat_smile:

As you saw last time, I had talked myself out of this when I realized it doesn’t save any boundschecks. But then I saw this thread:

Unicode makes the string API rather awkward, necessitating strange functions like chop; if we could access relative to first and last ordinal indices, methods like this wouldn’t be needed. Dictionaries and I imagine some other data structures could also make use of such indexing; I begin to wonder about sparse arrays.

Maybe it even makes sense to have both xs[begin:end] indexing and xs[first:last] indexing, where the former is understood to range over positional indices eachindex(xs), whereas the latter ranges over ordinal indices 1:length(xs)? (similar to the contrast between xs[i] vs xs[n*th])

I entertained that thought momentarily, but I was turned off when I realized the inconsistency in having 1st:-1st work but (1:-1)st not work. I’m not sure why unitrange a:b clamps to a:a-1 for b much less than a; maybe that can be changed?

1 Like

On second thought, I don’t like using first and last for this: it’d be strange for "αβγ"[last-2] to return 'α', but s="αβγ"; s[lastindex(s)-2] to return 'β'. Likewise, findfirst returns a positional index, not an ordinal index. The way the words get overloaded here is too messy.

I’m warming up to -1nth to represent the ordinal index of the last element just to have less stuff to memorize. Even if you have to write a*nth:-b*nth instead of (a:-b)nth.

Since we don’t need to claim any O(1) access semantics here, getindex for ordinal indices could be defined to fall back to leaning on the iteration interface. The intuitive understanding of ordinal indices could be: where c is an arbitrary iterable collection, the ordinal index operates such that c[n*nth] === collect(c)[n] and c[-n*nth] === collect(c)[end+1-n] for positive n.