Can `eltype()` deduce the element type of a generator?

data

#1

As of Julia Version 0.7.0-DEV.2036, calling eltype() on even the simplest generator returns Any:

julia> eltype(x for x in 1:2)
Any

Unless the generator function is very complicated, It does not look like a hard problem to deduce the actual type. As a POC, I tried

julia> Base.eltype(g::Base.Generator) = Base.return_types(g.f, (eltype(g.iter),))[1]
julia> eltype(x for x in 1:2)
Int64
julia> eltype(x*1.0 for x in 1:2)
Float64

I don’t think this would be hard to generalize for the case when return_types() returns multiple values.


#2

No you code should not rely on inference.


#3

Would you mind elaborating on this some more? Not being able to get the element type of a generator seems to significantly limit generators’ usefulness. Is there some way to decorate a generator with the element type information? For example, I can probably write a macro @ET such that eltype(@ET Int (x for x in 1:2)) would return Int.


#4

No that’s not supposed to matter at all.

No

You can certainly have your own wrapper to do that.


#5

You could totally create a wrapper struct for generators that asserted the type of all elements must be the type of the first (or is Union{} if empty). Our iteration protocol makes that a bit tricky at the moment, but it’s totally doable. Grab and store the first element and the iteration state, and then use a “deferred” iteration strategy wherein the subsequent element (or Nullable()) is in the state.


#6

Is that me being dumb or you being terse? :slight_smile:

I’ve just looked at how collect(::Generator) is implemented. The first thing it does is inferring the element type:

Where the macro @default_eltype expands to a call to Core.Inference.return_type.

So why is collect() allowed to use inference while eltype() is not?


#7

The key is how collect uses that information. It’s only used in the empty case. Otherwise, collect just uses the types of the values that it extracts from the array, gradually widening the array type as needed.


#8

Is that true? It looks like in the SizeUnknown branch it is used unconditionally.


#9

Have a look at what grow_to! does.


#10

Because people don’t like predictable behavior and forced the base to use the inference to give non-error in some cases that should really be error.

Yes, but it doesn’t/shouldn’t affect the final value. If it does, you should report it as a bug.


#11

I am new to this community, but you guys seem to like to communicate in riddles.

Why does it matter what exactly grow_to! does when it receives the inferred element type encoded in the type of its first argument? Looking closer at the two-argument grow_to!, I do realize that it discards the dest type that it receives replacing it with Union{}:

What I don’t understand is why this is necessary and if it is why it does not matter. From what I see grow_to! has to do a lot of extra work because it does not know the element type of the result.


#12

Is that sarcasm or a serious explanation? Are you suggesting that collecting from an empty generator should give an error?


#13

Haha, sorry about that. This one decision has elicited a lot of discussion over the past few years… and you’re in the development channel, which is where you’ll tend to get much shorter and curt replies that assume quite a bit of knowledge.

In short, type inference is entirely an optimization. It should be unobservable. We should be able to turn off inference or improve its results nilly wily without any results changing. Performance will change, of course.

The return type of comprehensions used to depend on inference in all cases, and it could cause lots of surprising effects. If you changed something to be a global that wrecked type inference, you’d suddenly get an Array{Any} that can’t work with BLAS or other external (non-Julia) libraries.

Yes, there are three options for what to do in the [x for x in empty] case — use inference, return an Array{Union{}} or be an error. All three have been proposed, and I think we did the Union{} thing for a brief while.


#14

That is a serious explanation and yes I am saying that an error is better than a random type. A Vector{Union{}}(0) would work too of course though it’ll cause type instability.


#15

Thanks, @mbauman. This makes perfect sense. And thank you for being patient with me. I should have reformulated my question as a user question and posted it under “Usage”. This said, what is the proper etiquette here - should I open a new topic or change the category here?

My specific question relates to implementing an IterableTables “sink” functionality for my own table type. The approach taken by IterableTables is to wrap generators in an element type aware “GeneratorIterator”:

julia> eltype(TableTraits.getiterator(i for i in 1:0))
Int64
julia> eltype(TableTraits.getiterator(@NT(a=i, b=0.1i) for i in 1:0))
NamedTuples._NT_a_b{Int64,Float64}

The discussion in this topic so far suggests that this is not a good idea. What would be the right approach? I don’t think throwing an error on an empty query result is an option in my case.


#16

That’s exactly why I said,

If the user of a iterator wants a type when it’s not empty, that type should be supplied explicitly. Otherwise no answer can be given theoretically. In general though the user should not care about the type at all and just passing on the iterator does not require any knowledge of the return type of the iterator.


#17

That’s true, but the problem that I am trying to solve does seem to require such knowledge. Moreover, all solutions I’ve seen so far (Base.collect() and TableTraits.getiterator()) do use inference in one way or another. I understand that inference may at times be fragile, but it is not in the case of f(x)::T for x in iter. A macro can easily be written to convert a generator expression with a type assert to an iterator type with iteratoreltype(iter) == HasEltype(). Why couldn’t that be the default?

The current behavior –

julia> eltype(x::Int for x in 1:0)
Any

– looks very counterintuitive.


#18

It is, T is not necessarily known and that it is still meaningless in the empty case as well as being able to be Union{} instead.
It’s also wrong when T is not a leaftype.

Because it couldn’t be done as explained above.

And as I said, if you know the type, supply it explicitly to the user of the iterator. Having a eltype is not what a generator is for.


#19

It was stated that it could not be done, but not explained. Having eltype() of a generator return the actual type can be done as I have shown in my first post. My solution may violate some theoretical principles, but in practice, the same solution is implemented in various places. @mbauman explained that type inference may be fragile and should be avoided, but what can be wrong with the following POC:

struct TypedGenerator{T}
    g::Base.Generator
end
macro typed(ex)
    if ex.args[1].head == :(::)
        :(TypedGenerator{$(ex.args[1].args[2])}($ex))
    else
        ex
    end
end
Base.iteratoreltype(::TypedGenerator{T}) where T = Base.HasEltype()
Base.eltype(::TypedGenerator{T}) where T = T
# Define iterator protocol methods delegating to `.g`.
julia> eltype(@typed (f(x) for x in 1:2))
Any

julia> eltype(@typed (f(x)::Int for x in 1:2))
Int64

If such code is acceptable in a user program, why would it be a bad idea for (f(x)::Int for x in 1:2) to return a variant of TypedGenerator{T} without the @typed macro?

How would the user that knows the type of f(x) and wants to provide that type in a generator expression (f(x) for x in iter) actually do it? I suggest that (f(x)::T for x in iter) is the first thing that comes to mind and if you subscribe to the principle of least surprise, the eltype() of this expression should be T.


#20

I did!

So to repeat,

Or to break it down,

  1. T is not necessarily known (and it might not be a constant at all)
  2. It is still meaningless in the empty case
  3. Return value could be Union{}
  4. T may not be a leaftype.

Again, because it’s wrong.

Don’t use generator and use some other type. Any wrapper around this iterator would do. For example, just give the type to your TableTraits.getiterator function.

Adding a ::T in an expression to change the behavior is about as much surprise as you can get. This kind of behavioral change does not exist anywhere else in the language.