Getting the last element of an iterator

collect isn’t ‘supported’ for infinite iterators. If you work according to the interface, you should be calling Iterators.IteratorSize() before collect, an infinite iterator will return IsInfinite() and you should abort. Or instead collect the composition of your iterator with take, to guarantee termination.

For SizeUnknown() iterators, the generation phase is no less efficient than in the known size case, except the output array cannot be preallocated, and must be grown by pushing.

Checking for HasLength() is too weak, sinch HasShape() also guarantees finite size, and SizeUnknown() actually suggests finite size by virtue of not being IsInfinite(). It seems like the interface is lacking a way to distinguish between SizeUnknown() but definitely finite, when collect is safe, and SizeUnknown() but possibly infinite.

should just use Base.iterate and it will be fine for arrays.

Except if it needs the last value without performing an iteration, then it shouldn’t because Base.iterate is not a random access interface, while last implicitly requires that.

3 Likes

Thanks for all the answers! I’ve thought about it for a while and I have to say it still looks quite wrong to me. Please consider the following points:

Generic function names such as last should be generally useful. It’s weird and inconsistent to reserve a name for some data structures because of performance characteristics. If we want a constraint like this, it should be a different function with a hint in the name.

Do algorithm writers really want to prevent uses that don’t meet some performance guarantee? I doubt it. Much better to document the assumptions for the complexity guarantee!

Now on the specifics. The main idea seems to be:

No generic fallback for last. This way people can use last with a guarantee of O(1).

But that’s misleading: there is no such guarantee. For example SLinkedList (from LinkedLists.jl) has an O(n) implementation of last. Do we really want to say package authors should not include such methods, even when it’s very useful? Conclusion: it makes things more surprising, not less.

And there is the added surprise that reasonable code such as eachline(cmd) |> last will fail. It gives the API a bad feeling, like there are random holes that can catch you when writing generic code.

Regarding Iterators: Algorithmic complexity seems like a bad reason to put a method there. For example we need Iterators.filter and Iterators.map (Julia 1.6) because they have different semantics than the Base version (returning iterators rather than arrays).

I think a real effect of the current behavior is that many people will do the equivalent of collect |> last, which has terrible performance. It’s unfortunate that the API nudges people to do that.

1 Like

I’m arguing that this should be changed (for the reasons given in my previous comment).

lastiterate(itr) = foldl((_, y) -> y, itr)

or

lastiterate(itr) = applicable(last, itr) ? last(itr) :  foldl((_, y) -> y, itr)
2 Likes

The most general method for iterators that should be O(1) is

first(Iterators.reverse(iterator))

which works for any iterator type that has implemented reverse iteration.

However, currently eachline has no implementation of reverse-order iteration, so this throws a MethodError. In principle, it would be straightforward to implement reverse-order eachline for files, but it doesn’t seem possible to do efficiently for pipes.

(There is an Iterators.last(iterator, n) iterator that uses this reverse approach.)

9 Likes

For future reference, yes, this is what I want, specifically. The author should not be extending a Base method if the base documentation is generic and goes against the behavior they want to implement. At very least, if they do this then they should create a documentation for the specific method (over their data structure) with a warning.

I want package authors to not ignore performance guarantees set by the Base, so I can get a datastructure of some package and pass it to some algorithm that I know the complexity without needing to check if the author has violated any of the performances guaranteed. If it works then it is the right complexity. Simple. If I really do not care about the performance degradation I can carefully extend Base.last myself for the third-party data structure.

You are basically saying it is misleading the fact that package authors are not perfect programmers and may make mistakes that compromise API guarantees… there is no way to guarantee many things except by believing package authors will try to be faithful to the API, and disregarding packages that you discover to fail to do so.

I opened a PR in LinkedLists.jl to get this corrected, if possible.

And when the programmer hits that point, it will be clear to them what is the expected overhead. They may decide to not support iterators, or spend an extra minute to do a loop that avoids the extra allocation, or save that value when it was readily available before, or even rethink their datastructure. It seems to me a problem of bad programmers, not bad API.

2 Likes

The algorithmic complexity is more a reason why last hasn’t been implemented generically thus far, rather than a justification why it should never be implemented. As others have shown, there are some simple ways to do last when you know the iterator is finite. Maybe it’s time to incorporate a solution.

One simple approach would be to implement something for I/O cases alone. Relatively conservative would be lastline(io::IO...) perhaps borrowing from countlines, which already exists. And actually, would be nice to have something like Unix, e.g. tail(io::IO, n).

Another approach is a generic fallback last for finite iterators (more general than I/O). To sort out all the cases would take some thought, as @malacroi points out. But a conservative fallback could be implemented (e.g., only for HasLength()) with appropriate warnings and documentation. It may be fine if it’s O(>1) as long as not infinite.

A pull request might be entertained for either approach.

2 Likes

Are you really recommending type piracy as a general solution for this?

Yes and no. If you do not really care about breaking the API contract, then I do not see why you would care about a little type piracy (which, I say, is not a greater evil if you know what you are doing). However, as most problems solved by type piracy, you can just take the extra step and wrap the third-party type inside your own type, delegate all methods to the wrapped object, and then you do not have type piracy anymore.

I wish this were a bit easier than it actually is.

Yes. Having programmed in Ruby, where doing so is very easy, I dread the possibility of needing to do it in Julia and not having the same ease (but probably should be possible to make a macro to do so, I just not know if there is something robust already in stock). However, depending on the function that you will pass the wrapped datastructure to, the number of methods that you need to delegate may be small.

A singly-linked list has O(n) implementation for any list[n], so why not just implement getindex, and make it an <:AbstractVector?

I don’t think that anyone is saying that random access methods should not be implemented when they make sense, just that iterate is the wrong interface for this, and stretching it is not the right solution here.

2 Likes

I think that’s what @Henrique_Becker is saying. If you implement getindex and make an AbstractVector, then you automatically get a working last. It will be O(n) and that is supposedly forbidden.

From what @apo383 says however it seems that I (and Henrique) misunderstood the documentation: it just says that last is implemented for the O(1) case (not that it shouldn’t be implemented for O(n) or whatever). That seems right: if O(n) for last was forbidden, it should also be forbidden for lastindex and getindex, but the documentation for those says nothing about complexity.

I’ll file a PR to implement one of @apo383’s propositions, that should also clarify what is meant in the documentation.

I am not sure where that is coming from, I can’t find it in the docs. My reading is that

function Base.getindex(v::SomeType, i::Int)
    sleep(i)
    ... # do something
end

is a conforming implementation. O(1) access is nice when you can get it, but not a requirement.

2 Likes

Here, Tamas. It seems that Base.last should only be implemented if it has O(1) access, but if someone inherit AbstractArray and implement lastindex as O(n) (because it does not require to be O(1) in its documentation), then immediately Base.last will work for the type but have the wrong complexity.

However, inheriting AbstractArray for LinkedList (the specific case in question) and implementing a lastindex for it (considering that in a single-linked list this will take n operations and will give a index which cannot be iterated in any way) seems to just be a case in which it is very clear the concrete object does not adhere to the respective abstract concept, and it is misleading to do this inheritance in this case.

1 Like

I’m a bit perplexed by the emphasis put on complexity guarantees here. I see that this is in the docs, but why? There are any number of functions that have highly variable complexities for different types. Look at the LinearAlgebra library for example, you could argue that sum should only be implemented for Diagonal matrices, because that has complexity O(N), while ordinary dense matrices have O(N^2), but that would be absurd.

I don’t see why last should not simply be expected to ‘return the last element of an iterator’, complexity be damned.

4 Likes
  1. sum documentation does not give a complexity guarantee. It is a different thing to defend the documentation of Base.last to not have O(1) as a requirement, and defend that this requirement is to be ignored before Julia 2.0, in which it can be changed.
  2. The expected complexity of sum is O(n-1 Base.+ operations for the eltype) where n is the number of elements in the container. sum(diagonal_mat) and sum(ordinary_dense_matrice) both follow such complexity. What is N in your example? The length of one dimension of the container? This does not seem standard.

I don’t see why last should not simply be expected to ‘return the last element of an iterator’, complexity be damned.

Well, I already give why I think so in this thread, and you are not addressing it. So I believe this is rhetorical.

Yeah, you’re saying “because the docs say so”, and that’s fine, but I’m asking “why is this a good idea?” I’m not even arguing that it must be changed here and now, just saying it’s weird to have this complexity guarantee for last. What makes last so special?

It shouldn’t be difficult to guess from context that I am talking about an NxN matrix, in which case Diagonal has O(N) complexity, and dense matrices have O(N^2).

No, diagonal_mat has N^2 elements; that there is an optimized implementation that compresses this to N is just that, an optimization.

I would expect last to have complexity O(N). O(1) in some cases is just an optimization, just like in a zillion other functions.

1 Like