The last element of an iterator is clearly not O(1). also note an iterator may not be finite (also probably because that iterator interface is not defined by some type, unlike Range which is also lazily represented but can be lasted easily).
I think it is more that collect is clearly O(n), there is no surprise there, as there is no other way it could be. Someone using last may unconsciously expect it to be O(1). More than that, if last has this complexity guarantee, then anyone may write a generic algorithm using it and give the expected complexity for their algorithm.
Presumably though you could define an Iterators.last that does something trivial like:
function Iterators.last(itr)
# should throw in a check for Base.IsInfinite()
# before doing this. Also potentially defer to
# `Base.last` when a method of `lastindex` exists
x = nothing # or first(itr), if that makes more sense
for y in itr
x = y
end
return x
end
I disagree. I prefer code clearly broken than surprises. I have nothing against a “generic last”, but I would prefer it to be separated. You can wrap your solution in a differently-named function, or even extend Base.last though it is not recommended.
Part of it is that the minimal interface only needs iterate, and doesn’t necessarily know anything about size. It’s optional to implement IteratorSize(IterType) and length(), let alone last. You could easily implement your own fallback last that works when an iterator HasLength() and errors otherwise. But right now you already get the error anyway.
It seems reasonable to leave it up to the implementer what optional methods to include with their iterator. Even if their iterator has an end, it could cause problems when a user expects last(iter) or iter[end] to give an answer quickly.
collect isn’t ‘supported’ for infinite iterators. If you work according to the interface, you should be calling Iterators.IteratorSize() before collect, an infinite iterator will return IsInfinite() and you should abort. Or instead collect the composition of your iterator with take, to guarantee termination.
For SizeUnknown() iterators, the generation phase is no less efficient than in the known size case, except the output array cannot be preallocated, and must be grown by pushing.
Checking for HasLength() is too weak, sinch HasShape() also guarantees finite size, and SizeUnknown() actually suggests finite size by virtue of not being IsInfinite(). It seems like the interface is lacking a way to distinguish between SizeUnknown() but definitely finite, when collect is safe, and SizeUnknown() but possibly infinite.
should just use Base.iterate and it will be fine for arrays.
Except if it needs the last value without performing an iteration, then it shouldn’t because Base.iterate is not a random access interface, while last implicitly requires that.
Thanks for all the answers! I’ve thought about it for a while and I have to say it still looks quite wrong to me. Please consider the following points:
Generic function names such as last should be generally useful. It’s weird and inconsistent to reserve a name for some data structures because of performance characteristics. If we want a constraint like this, it should be a different function with a hint in the name.
Do algorithm writers really want to prevent uses that don’t meet some performance guarantee? I doubt it. Much better to document the assumptions for the complexity guarantee!
Now on the specifics. The main idea seems to be:
No generic fallback for last. This way people can use last with a guarantee of O(1).
But that’s misleading: there is no such guarantee. For example SLinkedList (from LinkedLists.jl) has an O(n) implementation of last. Do we really want to say package authors should not include such methods, even when it’s very useful? Conclusion: it makes things more surprising, not less.
And there is the added surprise that reasonable code such as eachline(cmd) |> last will fail. It gives the API a bad feeling, like there are random holes that can catch you when writing generic code.
Regarding Iterators: Algorithmic complexity seems like a bad reason to put a method there. For example we need Iterators.filter and Iterators.map (Julia 1.6) because they have different semantics than the Base version (returning iterators rather than arrays).
I think a real effect of the current behavior is that many people will do the equivalent of collect |> last, which has terrible performance. It’s unfortunate that the API nudges people to do that.
However, currently eachline has no implementation of reverse-order iteration, so this throws a MethodError. In principle, it would be straightforward to implement reverse-order eachline for files, but it doesn’t seem possible to do efficiently for pipes.
For future reference, yes, this is what I want, specifically. The author should not be extending a Base method if the base documentation is generic and goes against the behavior they want to implement. At very least, if they do this then they should create a documentation for the specific method (over their data structure) with a warning.
I want package authors to not ignore performance guarantees set by the Base, so I can get a datastructure of some package and pass it to some algorithm that I know the complexity without needing to check if the author has violated any of the performances guaranteed. If it works then it is the right complexity. Simple. If I really do not care about the performance degradation I can carefully extend Base.last myself for the third-party data structure.
You are basically saying it is misleading the fact that package authors are not perfect programmers and may make mistakes that compromise API guarantees… there is no way to guarantee many things except by believing package authors will try to be faithful to the API, and disregarding packages that you discover to fail to do so.
And when the programmer hits that point, it will be clear to them what is the expected overhead. They may decide to not support iterators, or spend an extra minute to do a loop that avoids the extra allocation, or save that value when it was readily available before, or even rethink their datastructure. It seems to me a problem of bad programmers, not bad API.
The algorithmic complexity is more a reason why last hasn’t been implemented generically thus far, rather than a justification why it should never be implemented. As others have shown, there are some simple ways to do last when you know the iterator is finite. Maybe it’s time to incorporate a solution.
One simple approach would be to implement something for I/O cases alone. Relatively conservative would be lastline(io::IO...) perhaps borrowing from countlines, which already exists. And actually, would be nice to have something like Unix, e.g. tail(io::IO, n).
Another approach is a generic fallback last for finite iterators (more general than I/O). To sort out all the cases would take some thought, as @malacroi points out. But a conservative fallback could be implemented (e.g., only for HasLength()) with appropriate warnings and documentation. It may be fine if it’s O(>1) as long as not infinite.
A pull request might be entertained for either approach.
Yes and no. If you do not really care about breaking the API contract, then I do not see why you would care about a little type piracy (which, I say, is not a greater evil if you know what you are doing). However, as most problems solved by type piracy, you can just take the extra step and wrap the third-party type inside your own type, delegate all methods to the wrapped object, and then you do not have type piracy anymore.