# Getting the last element of an iterator

Question 1: Is there a nice one-liner to get the last element of an iterator, without collecting in a temporary array? The obvious doesn’t work:

``````julia> eachline(`dir`) |> last
ERROR: MethodError: no method matching lastindex(::Base.EachLine{Base.PipeEndpoint})
``````

Question 2: Why is `last` not defined on iterators? The `last` documentation says:

Get the last element of an ordered collection, if it can be computed in O(1) time.

Why this limitation?

BTW I suspect the person who wrote the documentation for `Base.Iterators.only` also expected this to work because it refers to `last`

1 Like

The last element of an iterator is clearly not `O(1)`. also note an iterator may not be finite (also probably because that iterator interface is not defined by some type, unlike `Range` which is also lazily represented but can be `last`ed easily).

5 Likes

But why implement `last` only when it’s O(1)?

As for finite/infinite… by this logic `collect` should not be supported either?

I think it is more that collect is clearly `O(n)`, there is no surprise there, as there is no other way it could be. Someone using last may unconsciously expect it to be `O(1)`. More than that, if `last` has this complexity guarantee, then anyone may write a generic algorithm using it and give the expected complexity for their algorithm.

4 Likes

Presumably though you could define an `Iterators.last` that does something trivial like:

``````function Iterators.last(itr)
# should throw in a check for Base.IsInfinite()
# before doing this. Also potentially defer to
# `Base.last` when a method of `lastindex` exists
x = nothing # or first(itr), if that makes more sense
for y in itr
x = y
end
return x
end``````
3 Likes

@Henrique_Becker this doesn’t seem like a very good deal…

We could have:

• An algorithm that works both with arrays and iterators, O(1) in the first case, O(n) in the other case, which can be documented.
• A generally useful `last` (including situations like mine).
• People that really want this selective behavior can use `lastindex` with minimal differences in the code.

• An algorithm that doesn’t work at all in the second case.
• Many situations where `last` would be natural but doesn’t work.
• People that want `last` for iterators need to replace a function call with a for loop and temporary variable, a much more intrusive change.

(Note that in many cases the algorithm would not work with iterators anyway, since `last` would consume the elements.)

I disagree. I prefer code clearly broken than surprises. I have nothing against a "generic `last`", but I would prefer it to be separated. You can wrap your solution in a differently-named function, or even extend `Base.last` though it is not recommended.

2 Likes

Part of it is that the minimal interface only needs `iterate`, and doesn’t necessarily know anything about size. It’s optional to implement `IteratorSize(IterType)` and `length()`, let alone `last`. You could easily implement your own fallback `last` that works when an iterator `HasLength()` and errors otherwise. But right now you already get the error anyway.

It seems reasonable to leave it up to the implementer what optional methods to include with their iterator. Even if their iterator has an end, it could cause problems when a user expects `last(iter)` or `iter[end]` to give an answer quickly.

`collect` isn’t ‘supported’ for infinite iterators. If you work according to the interface, you should be calling `Iterators.IteratorSize()` before `collect`, an infinite iterator will return `IsInfinite()` and you should abort. Or instead collect the composition of your iterator with `take`, to guarantee termination.

For `SizeUnknown()` iterators, the generation phase is no less efficient than in the known size case, except the output array cannot be preallocated, and must be grown by pushing.

Checking for `HasLength()` is too weak, sinch `HasShape()` also guarantees finite size, and `SizeUnknown()` actually suggests finite size by virtue of not being `IsInfinite()`. It seems like the interface is lacking a way to distinguish between `SizeUnknown()` but definitely finite, when `collect` is safe, and `SizeUnknown()` but possibly infinite.

should just use `Base.iterate` and it will be fine for arrays.

Except if it needs the last value without performing an iteration, then it shouldn’t because `Base.iterate` is not a random access interface, while `last` implicitly requires that.

3 Likes

Thanks for all the answers! I’ve thought about it for a while and I have to say it still looks quite wrong to me. Please consider the following points:

Generic function names such as `last` should be generally useful. It’s weird and inconsistent to reserve a name for some data structures because of performance characteristics. If we want a constraint like this, it should be a different function with a hint in the name.

Do algorithm writers really want to prevent uses that don’t meet some performance guarantee? I doubt it. Much better to document the assumptions for the complexity guarantee!

Now on the specifics. The main idea seems to be:

No generic fallback for `last`. This way people can use `last` with a guarantee of O(1).

But that’s misleading: there is no such guarantee. For example `SLinkedList` (from LinkedLists.jl) has an O(n) implementation of `last`. Do we really want to say package authors should not include such methods, even when it’s very useful? Conclusion: it makes things more surprising, not less.

And there is the added surprise that reasonable code such as `eachline(cmd) |> last` will fail. It gives the API a bad feeling, like there are random holes that can catch you when writing generic code.

Regarding `Iterators`: Algorithmic complexity seems like a bad reason to put a method there. For example we need `Iterators.filter` and `Iterators.map` (Julia 1.6) because they have different semantics than the `Base` version (returning iterators rather than arrays).

I think a real effect of the current behavior is that many people will do the equivalent of `collect |> last`, which has terrible performance. It’s unfortunate that the API nudges people to do that.

1 Like

I’m arguing that this should be changed (for the reasons given in my previous comment).

``````lastiterate(itr) = foldl((_, y) -> y, itr)
``````

or

``````lastiterate(itr) = applicable(last, itr) ? last(itr) :  foldl((_, y) -> y, itr)
``````
2 Likes

The most general method for iterators that should be O(1) is

``````first(Iterators.reverse(iterator))
``````

which works for any iterator type that has implemented reverse iteration.

However, currently `eachline` has no implementation of reverse-order iteration, so this throws a `MethodError`. In principle, it would be straightforward to implement reverse-order `eachline` for files, but it doesn’t seem possible to do efficiently for pipes.

(There is an `Iterators.last(iterator, n)` iterator that uses this `reverse` approach.)

9 Likes

For future reference, yes, this is what I want, specifically. The author should not be extending a `Base` method if the base documentation is generic and goes against the behavior they want to implement. At very least, if they do this then they should create a documentation for the specific method (over their data structure) with a warning.

I want package authors to not ignore performance guarantees set by the `Base`, so I can get a datastructure of some package and pass it to some algorithm that I know the complexity without needing to check if the author has violated any of the performances guaranteed. If it works then it is the right complexity. Simple. If I really do not care about the performance degradation I can carefully extend `Base.last` myself for the third-party data structure.

You are basically saying it is misleading the fact that package authors are not perfect programmers and may make mistakes that compromise API guarantees… there is no way to guarantee many things except by believing package authors will try to be faithful to the API, and disregarding packages that you discover to fail to do so.

I opened a PR in LinkedLists.jl to get this corrected, if possible.

And when the programmer hits that point, it will be clear to them what is the expected overhead. They may decide to not support iterators, or spend an extra minute to do a loop that avoids the extra allocation, or save that value when it was readily available before, or even rethink their datastructure. It seems to me a problem of bad programmers, not bad API.

2 Likes

The algorithmic complexity is more a reason why `last` hasn’t been implemented generically thus far, rather than a justification why it should never be implemented. As others have shown, there are some simple ways to do `last` when you know the iterator is finite. Maybe it’s time to incorporate a solution.

One simple approach would be to implement something for I/O cases alone. Relatively conservative would be `lastline(io::IO...)` perhaps borrowing from `countlines`, which already exists. And actually, would be nice to have something like Unix, e.g. `tail(io::IO, n)`.

Another approach is a generic fallback `last` for finite iterators (more general than I/O). To sort out all the cases would take some thought, as @malacroi points out. But a conservative fallback could be implemented (e.g., only for `HasLength()`) with appropriate warnings and documentation. It may be fine if it’s O(>1) as long as not infinite.

A pull request might be entertained for either approach.

2 Likes

Are you really recommending type piracy as a general solution for this?

Yes and no. If you do not really care about breaking the API contract, then I do not see why you would care about a little type piracy (which, I say, is not a greater evil if you know what you are doing). However, as most problems solved by type piracy, you can just take the extra step and wrap the third-party type inside your own type, delegate all methods to the wrapped object, and then you do not have type piracy anymore.

I wish this were a bit easier than it actually is.