Are generator expressions documented? Also other questions

Sorry, this might be a dumb question.

This search reveals nothing obviously relevant, at least in the top results: Search · The Julia Language

And yet I’ve noticed from some examples I saw online that Python-like generator expressions and list comprehensions do work:

julia> [i^2 for i = 1:5]
5-element Array{Int64,1}:
  1
  4
  9
 16
 25

This looks promising:

julia> (i^2 for i = 1:5)
Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##315#316"))}(getfield(Main, Symbol("##315#316"))(), 1:5)

And indeed [...] seems equivalent to collect((...)). However those getField and Symbol bits look… too meta to be compiled.

Is there documentation of this feature, its implementation, limitations, extension points, etc?

Python has achieved this nice unification between generators, co-routines, and async/await. I wonder if something has already been done or is equivalent is in the works for Julia.

Last related question: is there a PEP system, or something like Rust’s RFCs? Is it all github issues based? What’s the best way to get visibility into the language’s evolving core and future roadmap?

Yes and yes, even though it’s a bit further down.

The best way to find out what’s going on right now in the code is on github and here. As far as I know, it’s all based on issues and PRs. I know @kristoffer.carlsson mentioned having better Multithreading on the roadmap, but I’m not aware of a general collection and goals for any specific version. There are some issues under the “Milestones” section though: 1.1 and 1.0.x.

try searching for "gen*".

(the quotes and asterisk sometimes work to limit the frantic fuzzy searching that Documenter likes to do…)

There are Julep’s GitHub - JuliaLang/Juleps: Julia Enhancement Proposals which I suspect will be used a lot more post 1.0, most of the technical discussion indeed happen on Github.

Oh wow, that repo is old! Seems like most of the stuff has been implemented or is outdated though, right?

Well, Logging and Pkg3 made it into 1.0, GCExtensions is under development and I still dream of finishing RTLIB one day.

I thought it might be fun to try to use the reflection tools to figure out what’s going on:

julia> f() = (i^2 for i = 1:5)
f (generic function with 1 method)
julia> @code_lowered f()
CodeInfo(
1 1 ─      #5 = %new(Main.:(##5#6))
  │   %2 = #5
  │   %3 = 1:5
  │   %4 = (Base.Generator)(%2, %3)
  └──      return %4
)

OK, so in this case the documentation lives in Base.Generator:

help?> Base.Generator
  Generator(f, iter)

  Given a function f and an iterator iter, construct an iterator that yields the values of f applied to the elements of iter. The syntax f(x) for x in iter [if cond(x)::Bool] is syntax for constructing an instance of this type. The [if
  cond(x)::Bool] expression is optional and acts as a "guard", effectively filtering out values where the condition is false.

But there are still a number of confusing things going on here. If we ask for all of the code that the expression desugars to (not just the content of a top-level function we defined), we can see what’s going on (it’s kind of ugly, and making it more interpretable is something that’s probably worth doing):

julia> Meta.@lower (x^2 for x in 1:5)
julia> Meta.@lower (x^2 for x in 1:5)
:($(Expr(:thunk, CodeInfo(
 1 ─      $(Expr(:thunk, CodeInfo(
 1 ─     global ##7#8
 │       const ##7#8
 │       $(Expr(:struct_type, Symbol("##7#8"), :((Core.svec)()), :((Core.svec)()), :(Core.Function), :((Core.svec)()), false, 0))
 └──     return
)))
 │   %2 = (Core.svec)(##7#8, Core.Any)
 │   %3 = (Core.svec)()
 │   %4 = (Core.svec)(%2, %3)
 │        $(Expr(:method, false, :(%4), CodeInfo(quote
    (Core.apply_type)(Base.Val, 2)
    (%1)()
    (Base.literal_pow)(^, x, %2)
    return %3
end)))
 │        #7 = %new(##7#8)
 │   %7 = #7
 │   %8 = 1:5
 │   %9 = (Base.Generator)(%7, %8)
 └──      return %9
))))

Here’s basically the same thing in a nicer format:

function
1 - %1 = function
│        1 ─      global MyClosure
│        │        const MyClosure
│        │        struct MyClosure <: Core.Function
│        └──      return
│        end
│   %2 = function (::MyClosure)(x::Core.Any)
│        1 - %1 = Core.apply_type(Base.Val, 2)
│        │   %2 = %1()
│        │   %3 = Base.literal_pow(^, x, %2)
│        └──      return %3
│        end
│   %3 = %new(MyClosure)
│   %4 = 1:5
│   %5 = Base.Generator(%3, %4)
└──      return %5
end

And in ordinary Julia code:

struct MyClosure <: Function
(::MyClosure)(x) = Base.literal_pow(^, x, Val{2}())
return Base.Generator(MyClosure(), 1:5)

Essentially what happened is that, for full generality, the generator comprehension syntax actually constructs a closure type (in this case, a function would have been fine since nothing is being closed over) and instantiates it, then passes it to Base.Generator along with the object being iterated over.

As you noticed, the array comprehension syntax [...] lowers to a call to Base.collect on the generator. Because the generator is specialized (essentially “templated”) on both the concrete types of the iteration object and its closure (i.e., the getfield stuff you saw is one of the type parameters of Base.Generator), the implementation of Base.collect below is JIT-compiled for the specific generator expression being collected. Here’s the code in generator.jl:

function collect(itr::Generator)
    isz = IteratorSize(itr.iter)
    et = @default_eltype(itr)
    if isa(isz, SizeUnknown)
        return grow_to!(Vector{et}(), itr)
    else
        y = iterate(itr)
        if y === nothing
            return _array_for(et, itr.iter, isz)
        end
        v1, st = y
        collect_to_with_first!(_array_for(typeof(v1), itr.iter, isz), v1, itr, st)
    end
end

While the actual function inside the closure is opaque to the compiler, as are the values of the beginning and ending of the range being iterated over, the compiler can still generate fairly fast code:

julia> @btime collect(x^2 for x in 1:20)
  42.477 ns (1 allocation: 240 bytes)

We can make it faster by collecting onto a stack-allocated tuple with compiler-known length:

julia> @btime NTuple{20, Int64}(x^2 for x in 1:20)
  15.814 ns (0 allocations: 0 bytes)

And even faster by also specifying the beginning of the range at the type level:

julia> @btime NTuple{20, Int64}(x^2 for x in Base.OneTo(20))
  5.787 ns (0 allocations: 0 bytes)

Looking at the code_warntype, code_llvm, or code_native now shows that the Julia and LLVM compilers have basically inlined and unrolled absolutely everything.

The role of literal_pow (which showed up earlier) is to allow the same kind of compiler specialization on the value of the exponent in an expression like x^2, so that it becomes effectively x*x, which is faster than an actual exponentiation call.

13 Likes

If you really want people to take your julep seriously,
it should should be titled following the convention:

Taking foo seriously.

WRT coroutines etc
you might be interested in
https://white.ucc.asn.au/2017/11/18/Lazy-Sequences-in-Julia.html
and some of my other blog posts.

I am looking forward to PATR making threads and tasks into the same kind of thing.

2 Likes

Thanks, what an in-depth reply!