As of Julia Version 0.7.0-DEV.2036, calling eltype() on even the simplest generator returns Any:
julia> eltype(x for x in 1:2)
Any
Unless the generator function is very complicated, It does not look like a hard problem to deduce the actual type. As a POC, I tried
julia> Base.eltype(g::Base.Generator) = Base.return_types(g.f, (eltype(g.iter),))[1]
julia> eltype(x for x in 1:2)
Int64
julia> eltype(x*1.0 for x in 1:2)
Float64
I don’t think this would be hard to generalize for the case when return_types() returns multiple values.
Would you mind elaborating on this some more? Not being able to get the element type of a generator seems to significantly limit generators’ usefulness. Is there some way to decorate a generator with the element type information? For example, I can probably write a macro @ET such that eltype(@ET Int (x for x in 1:2)) would return Int.
You could totally create a wrapper struct for generators that asserted the type of all elements must be the type of the first (or is Union{} if empty). Our iteration protocol makes that a bit tricky at the moment, but it’s totally doable. Grab and store the first element and the iteration state, and then use a “deferred” iteration strategy wherein the subsequent element (or Nullable()) is in the state.
The key is how collect uses that information. It’s only used in the empty case. Otherwise, collect just uses the types of the values that it extracts from the array, gradually widening the array type as needed.
I am new to this community, but you guys seem to like to communicate in riddles.
Why does it matter what exactly grow_to! does when it receives the inferred element type encoded in the type of its first argument? Looking closer at the two-argument grow_to! , I do realize that it discards the dest type that it receives replacing it with Union{}:
What I don’t understand is why this is necessary and if it is why it does not matter. From what I see grow_to! has to do a lot of extra work because it does not know the element type of the result.
Haha, sorry about that. This one decision has elicited a lot of discussion over the past few years… and you’re in the development channel, which is where you’ll tend to get much shorter and curt replies that assume quite a bit of knowledge.
In short, type inference is entirely an optimization. It should be unobservable. We should be able to turn off inference or improve its results nilly wily without any results changing. Performance will change, of course.
The return type of comprehensions used to depend on inference in all cases, and it could cause lots of surprising effects. If you changed something to be a global that wrecked type inference, you’d suddenly get an Array{Any} that can’t work with BLAS or other external (non-Julia) libraries.
Yes, there are three options for what to do in the [x for x in empty] case — use inference, return an Array{Union{}} or be an error. All three have been proposed, and I think we did the Union{} thing for a brief while.
That is a serious explanation and yes I am saying that an error is better than a random type. A Vector{Union{}}(0) would work too of course though it’ll cause type instability.
Thanks, @mbauman. This makes perfect sense. And thank you for being patient with me. I should have reformulated my question as a user question and posted it under “Usage”. This said, what is the proper etiquette here - should I open a new topic or change the category here?
My specific question relates to implementing an IterableTables “sink” functionality for my own table type. The approach taken by IterableTables is to wrap generators in an element type aware “GeneratorIterator”:
julia> eltype(TableTraits.getiterator(i for i in 1:0))
Int64
julia> eltype(TableTraits.getiterator(@NT(a=i, b=0.1i) for i in 1:0))
NamedTuples._NT_a_b{Int64,Float64}
The discussion in this topic so far suggests that this is not a good idea. What would be the right approach? I don’t think throwing an error on an empty query result is an option in my case.
If the user of a iterator wants a type when it’s not empty, that type should be supplied explicitly. Otherwise no answer can be given theoretically. In general though the user should not care about the type at all and just passing on the iterator does not require any knowledge of the return type of the iterator.
That’s true, but the problem that I am trying to solve does seem to require such knowledge. Moreover, all solutions I’ve seen so far (Base.collect() and TableTraits.getiterator()) do use inference in one way or another. I understand that inference may at times be fragile, but it is not in the case of f(x)::T for x in iter. A macro can easily be written to convert a generator expression with a type assert to an iterator type with iteratoreltype(iter) == HasEltype(). Why couldn’t that be the default?
It is, T is not necessarily known and that it is still meaningless in the empty case as well as being able to be Union{} instead.
It’s also wrong when T is not a leaftype.
Because it couldn’t be done as explained above.
And as I said, if you know the type, supply it explicitly to the user of the iterator. Having a eltype is not what a generator is for.
It was stated that it could not be done, but not explained. Having eltype() of a generator return the actual type can be done as I have shown in my first post. My solution may violate some theoretical principles, but in practice, the same solution is implemented in various places. @mbauman explained that type inference may be fragile and should be avoided, but what can be wrong with the following POC:
struct TypedGenerator{T}
g::Base.Generator
end
macro typed(ex)
if ex.args[1].head == :(::)
:(TypedGenerator{$(ex.args[1].args[2])}($ex))
else
ex
end
end
Base.iteratoreltype(::TypedGenerator{T}) where T = Base.HasEltype()
Base.eltype(::TypedGenerator{T}) where T = T
# Define iterator protocol methods delegating to `.g`.
julia> eltype(@typed (f(x) for x in 1:2))
Any
julia> eltype(@typed (f(x)::Int for x in 1:2))
Int64
If such code is acceptable in a user program, why would it be a bad idea for (f(x)::Int for x in 1:2) to return a variant of TypedGenerator{T} without the @typed macro?
How would the user that knows the type of f(x) and wants to provide that type in a generator expression (f(x) for x in iter) actually do it? I suggest that (f(x)::T for x in iter) is the first thing that comes to mind and if you subscribe to the principle of least surprise, the eltype() of this expression should be T.
T is not necessarily known (and it might not be a constant at all)
It is still meaningless in the empty case
Return value could be Union{}
T may not be a leaftype.
Again, because it’s wrong.
Don’t use generator and use some other type. Any wrapper around this iterator would do. For example, just give the type to your TableTraits.getiterator function.
Adding a ::T in an expression to change the behavior is about as much surprise as you can get. This kind of behavioral change does not exist anywhere else in the language.