PyGen - python style generators

yuyichao · May 2, 2017, 2:42pm

That’s exactly why they are very different. Python being an interpreter essentially make every function a closure (they explicitly keeps their states) except that they call it frames. Julia tasks switches native stacks with is very different.

Glen · May 6, 2017, 4:22am

With the do syntax how would you specify the type and size of the Channel?

fengyang.wang · May 6, 2017, 4:25am

Specify the ctype and csize keyword arguments:

julia> f(x) = Channel(ctype=Int, csize=5) do c
           push!(c, ...)
       end

kevbonham · May 8, 2017, 12:25pm

And even better (in my opinion), combining it with short-form function syntax,

f(x) = Channel() do c
    ...
    push!(c, y)
    ...
end
Effectively a macro-free way of defining a python-like generator.

This seems substantially simpler than what I’ve been doing, which is following suggestions from this 2014 blog post:

function f(x)
    ...
    function _it()
        ...        
        produce(y)
    end
    Task(_it)
end

@nsmith - sounds like this is similar to what you’re doing in your macro? When I read the release notes, I was worried this would stop working - this is switching to channels now?

nsmith · May 8, 2017, 5:27pm

@kevbonham Yeah, in 0.6, produce and consume are deprecated in favour of Channels. Some new Channel constructors have been made to emulate the produce, consume pattern more easily, which is what @fengyang.wang is using above.

Before v0.6 the equivalent clever idiom would be

f(x) = Task() do
    ...
    produce(y)
    ....
end

kevbonham · May 8, 2017, 5:44pm

Got it - thanks!

It’s a bit annoying from a learning standpoint that there’s nothing that clearly flags where the yield equivalent statement is. push! is pretty generic. So +1 for getting some dedicated syntax in Base.

Also @davidanthoff seemed to indicate there’s some overhead?

nsmith · May 8, 2017, 6:03pm

Yeah, there is a price to context switching back and forth between the producer Task and consumer Task. The [C# implementation] (The Old New Thing) expands the generator code in the callers frame in the form of a pile of goto statements so there is no context switching overhead. In 0.6 the new @label and @goto macros could be used to implement something similar, which would be pretty neat.

FemtoTrader · August 9, 2017, 7:58pm

I did some measurements using the following code and that’s quite impressive.

using PyGen

N     = 10000000
Ndisp = 1000000

println("for loop")

function for_loop()
    for i in 0:N
        if i % Ndisp == 0
            println(i)
        end
    end
end


println(@elapsed for_loop())

println()
println("pygen generator")

@pygen function pygen_generator()
    for i in 0:N
        if i % Ndisp == 0
            yield(i)
        end
    end
end

function using_a_pygen_generator()
    for i in pygen_generator()
        println(i)
    end
end

println(@elapsed using_a_pygen_generator())

It will be great to have something like that include in Base.

If this can’t be add to Base and it’s still a standalone package, I think it should be renamed.
PyGen made me think that it was using Python !

Can someone post here a similar sample code with Channel?

yuyichao · August 9, 2017, 8:10pm

FYI, a loop over 10000000 takes no time compare to printing 10 numbers and the dispatch caused by global variables.

FemtoTrader · August 9, 2017, 8:18pm

I tested previously Task produce/consume with older Julia versions and it was much longer (even for some basic tasks like that)

I’m experimenting Channel for the first time.

channel_generator() = Channel() do c
    for i in 0:N
        if i % Ndisp == 0
            push!(c, i)
        end
    end
end

println()
println("channel generator")
function using_a_channel_generator()
    for i in channel_generator()
        println(i)
    end
end

println(@elapsed using_a_channel_generator())

According you, what kind of measurements should be done to ensure that it doesn’t slow down too much?

yuyichao · August 9, 2017, 8:24pm

As a start, make N and Ndisp constant globals.

yuyichao · August 9, 2017, 8:26pm

Also, FYI, these are not new in 0.6. They exist in 0.3.

FemtoTrader · August 9, 2017, 8:27pm

That’s very impressive!
Why is there so much speed difference?

FemtoTrader · August 9, 2017, 8:30pm

With

using PyGen

const N     = 10000000
const Ndisp = 1000000

println("for loop")

function for_loop()
    for i in 0:N
        if i % Ndisp == 0
            println(i)
        end
    end
end


println(@elapsed for_loop())

# ===

println()
println("pygen generator")

@pygen function pygen_generator()
    for i in 0:N
        if i % Ndisp == 0
            yield(i)
        end
    end
end

function using_a_pygen_generator()
    for i in pygen_generator()
        println(i)
    end
end

println(@elapsed using_a_pygen_generator())

# ===

channel_generator() = Channel() do c
    for i in 0:N
        if i % Ndisp == 0
            push!(c, i)
        end
    end
end

println()
println("channel generator")
function using_a_channel_generator()
    for i in channel_generator()
        println(i)
    end
end

println(@elapsed using_a_channel_generator())

it seems that PyGen is nearly 4x slower!

for loop
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
9000000
10000000
0.075400854

pygen generator
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
9000000
10000000
0.204231019

channel generator
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
9000000
10000000
0.056583126

BenLauwens · August 9, 2017, 8:42pm

Hi

The latest version of SimJulia has an implementation of C# sharp style generators, i.e. a function yielding values is transformed in a finite state-machine.

@resumable function fib()
    a = 0
    b = 1
    while true
        @yield return a
        a, b = b, a+b
    end
end

fib_gen = fib()

for i in 1:10
    println(fib_gen())
end

This approach is a lot faster than produce/consume or the newer channels if the output can not be buffered as is done when using channels.
If there is some interest, I can take this out of SimJulia and make it an independent package.

yuyichao · August 9, 2017, 8:44pm

It’s way more than 4x slower. The print is still way slower than anything else for the normal loop approach.

Yes, that’s the correct way to implement this.

nsmith · August 9, 2017, 8:57pm

Thats excellent @BenLauwens! I think there is interest in a independent package for this. I was approached a few times about publishing PyGen but I felt the C# sharp style was the right solution and didn’t have the time to implement it. Also the naming thing (@resumable is a good name by the way).

@FemtoTrader I’m not sure why your Channel implementation is faster than PyGen. Maybe there was a type instability or something in my implementation

BenLauwens · August 9, 2017, 9:45pm

I think the Channel implementation is faster because the function push!(c, i) buffers the results and the print loop reads from the buffer. This is a lot faster because no Task switching is done. However as I do in SimJulia the yielding of values and the use of the values are to be synchronised, so no buffering can be allowed. Doing the same benchmark with a Channel(0) object gives very different results.

FemtoTrader · August 10, 2017, 8:20am

@BenLauwens I’m trying with SimJulia

using SimJulia

println()
println("simjulia generator")

@resumable function simjulia_gen()
    for i in 0:N
        if i % Ndisp == 0
            @yield return i
        end
    end
end

function using_a_simjulia_generator()
    simjulia_generator = simjulia_gen()
    for i in simjulia_generator()
        println(i)
    end
end

println(@elapsed using_a_simjulia_generator())

but it only displays

simjulia generator
0
0.026708957

Any idea what is going wrong?

BenLauwens · August 10, 2017, 1:31pm

@FemtoTrader I have not yet implemented the iterator interface.

import Base.done, Base.next, Base.start

using SimJulia

start(fsm::T) where T<:FiniteStateMachine = fsm._state

next(fsm::T, state::UInt8) where T<:FiniteStateMachine = fsm(), fsm._state

done(fsm::T, state::UInt8) where T<:FiniteStateMachine = fsm._state == 0xff

@resumable function simjulia_gen()
    i = 0
    while true
        if i % Ndisp == 0
            if i + Ndisp < N
                @yield return i
            else
                return i
            end 
        end
        i+= 1
    end
end

N = 100
Ndisp = 3

function using_a_simjulia_generator()
    for i in simjulia_gen()
        println(i)
    end
end

println(@elapsed using_a_simjulia_generator())

@yield inside a for loop is not yet possible… the for loop is rewritten with an internal variable #temp during the lowering process that I can’t capture in the macro. This is one of the reasons that C# sharp style generators should be implemented in core Julia. This is a straightforward extension of closures…

To compare with the other generators, you can better not print the results. println takes more time than the task switching or the function calls.

Another possibility is the use of llvm coroutines. I have no idea if someone has already tried to use them in Julia.

Topic		Replies	Views
An experiment for Python Style (but Unidirectional) Generators for Julia Internals & Design	8	1178	February 16, 2022
From generator (with Channel) to iterator General Usage	6	2607	April 24, 2018
ANN: ResumableFunctions Community package , announcement	5	2073	September 9, 2017
Yield in iterator General Usage	5	1100	November 28, 2023
Creating Generators General Usage generator	32	14868	July 6, 2022

PyGen - python style generators

Related topics