PyGen - python style generators

package
announcement

#36

It’s way more than 4x slower. The print is still way slower than anything else for the normal loop approach.

Yes, that’s the correct way to implement this.


#37

Thats excellent @BenLauwens! I think there is interest in a independent package for this. I was approached a few times about publishing PyGen but I felt the C# sharp style was the right solution and didn’t have the time to implement it. Also the naming thing (@resumable is a good name by the way).

@FemtoTrader I’m not sure why your Channel implementation is faster than PyGen. Maybe there was a type instability or something in my implementation


#38

I think the Channel implementation is faster because the function push!(c, i) buffers the results and the print loop reads from the buffer. This is a lot faster because no Task switching is done. However as I do in SimJulia the yielding of values and the use of the values are to be synchronised, so no buffering can be allowed. Doing the same benchmark with a Channel(0) object gives very different results.


#39

@BenLauwens I’m trying with SimJulia

using SimJulia

println()
println("simjulia generator")

@resumable function simjulia_gen()
    for i in 0:N
        if i % Ndisp == 0
            @yield return i
        end
    end
end

function using_a_simjulia_generator()
    simjulia_generator = simjulia_gen()
    for i in simjulia_generator()
        println(i)
    end
end

println(@elapsed using_a_simjulia_generator())

but it only displays

simjulia generator
0
0.026708957

Any idea what is going wrong?


#40

@FemtoTrader I have not yet implemented the iterator interface.

import Base.done, Base.next, Base.start

using SimJulia

start(fsm::T) where T<:FiniteStateMachine = fsm._state

next(fsm::T, state::UInt8) where T<:FiniteStateMachine = fsm(), fsm._state

done(fsm::T, state::UInt8) where T<:FiniteStateMachine = fsm._state == 0xff

@resumable function simjulia_gen()
    i = 0
    while true
        if i % Ndisp == 0
            if i + Ndisp < N
                @yield return i
            else
                return i
            end 
        end
        i+= 1
    end
end

N = 100
Ndisp = 3

function using_a_simjulia_generator()
    for i in simjulia_gen()
        println(i)
    end
end

println(@elapsed using_a_simjulia_generator())

@yield inside a for loop is not yet possible… the for loop is rewritten with an internal variable #temp during the lowering process that I can’t capture in the macro. This is one of the reasons that C# sharp style generators should be implemented in core Julia. This is a straightforward extension of closures…

To compare with the other generators, you can better not print the results. println takes more time than the task switching or the function calls.

Another possibility is the use of llvm coroutines. I have no idea if someone has already tried to use them in Julia.


#41

Interesting but quite technical presentation about coroutines in llvm: http://llvm.org/devmtg/2016-11/Slides/Nishanov-LLVMCoroutines.pdf


#42

@BenLauwens please do make it an independent package


#43

@nsmith, @JeffreySarnoff I have just finished a prerelease version of ResumableFunctions for Julia v0.6:

using ResumableFunctions

@resumable function fibonnaci(n::Int) :: Int
  a = 0
  b = 1
  for i in 1:n-1
    @yield a
    a, b = b, a+b
  end
  a
end

for v in fibonnaci(10)
  println(v)
end

Two way communication between the master and the slave function is possible:

@resumable function conversation()
  println("Who are you?")
  name = @yield
  println("Hello ", name, "!")
  @yield "I am a resumable function."
  println("Nice to meet you!")
end

two_way = conversation()
two_way()
println(two_way("Ben"))
two_way("Hi!")

Yielding inside a try-catch-finally-end clause is supported (only top level @yield statement):

@resumable function catch_me()
  try
    arg = @yield
    println(arg)
    arg = @yield
    println(arg)
    @yield 
  catch exc
    println(exc)
    arg = @yield
    println(arg)
  finally
    println("Cat: Next time ...!")
  end
end

struct MouseTooSlow <: Exception end

mouse = catch_me()
mouse()
mouse("Catch me if you can!")
mouse(MouseTooSlow())
mouse("I escaped!)

Comments and bug reports are welcome! I will start writing some formal documentation…
Have fun!

Ben


#44

super


#45

Great work @BenLauwens! Thanks for taking up the torch on this one. I’ll point users from my repository towards this one and am looking forward to browsing through the implementation.


#46

A small benchmark to compare ResumableFunctions with produce/consume and channels

using BenchmarkTools
using ResumableFunctions

const n = 93

@resumable function fibonnaci_resumable()
  a = zero(Int)
  b = a + one(a)
  while true
    @yield a
    a, b = b, a + b
  end
end

function test_resumable()
  fib_resumable = fibonnaci_resumable()
  for i in 1:n 
    fib_resumable() 
  end
end

println("ResumableFunctions: ")
@btime test_resumable()

function my_produce(v)
  ct = current_task()
  consumer = ct.consumers
  ct.consumers = nothing
  Base.schedule_and_wait(consumer, v)
  return consumer.result
end

function my_consume(producer::Task, values...)
  istaskdone(producer) && return producer.value
  ct = current_task()
  ct.result = length(values)==1 ? values[1] : values
  producer.consumers = ct
  Base.schedule_and_wait(producer)
end

function fibonnaci_produce()
  a = zero(Int)
  b = a + one(a)
  while true
    my_produce(a)
    a, b = b, a + b
  end
end

function test_produce()
  fib_produce = @task fibonnaci_produce()
  for i in 1:n 
    my_consume(fib_produce) 
  end
end

println("Produce/consume: ")
@btime test_produce()

function fibonnaci_channel(ch::Channel)
  a = zero(Int)
  b = a + one(a)
  while true
    put!(ch, a)
    a, b = b, a + b
  end
end

function test_channel(csize::Int)
  fib_channel = Channel(fibonnaci_channel; ctype=Int, csize=csize)
  for i in 1:n 
    take!(fib_channel) 
  end
end

println("Channels csize=0: ")
@btime test_channel(0)

println("Channels csize=20: ")
@btime test_channel(20)

println("Channels csize=100: ")
@btime test_channel(100)

… and the results on my laptop

ResumableFunctions:
  416.533 ns (1 allocation: 32 bytes)
Produce/consume:
  36.599 μs (158 allocations: 4.06 KiB)
Channels csize=0:
  95.691 μs (276 allocations: 7.47 KiB)
Channels csize=20:
  17.731 μs (116 allocations: 5.88 KiB)
Channels csize=100:
  13.157 μs (108 allocations: 6.77 KiB)

Two almost trivial conclusions:

  • context switching is expensive compared to function calls;
  • performance of channels depends on the parameter csize

If this benchmark is biased or if channels can be used in a more performant way, all input is welcome!


#47

Pretty amazing timings. You could also add a benchmark for a version that loops directly over the Fibonacci numbers, to see how much overhead there is behind the scenes of @resumable. I wonder whether something like this could end up in base one day. This would be a powerful way to express complex iteration patterns, especially nested ones, that can be sometimes hard to express otherwise.


#48

I tried a loop and also a version with a closure. Due to #15276, the closure version isn’t as nice, though…


using BenchmarkTools, ResumableFunctions
const n = 93
function test_seq()
    a, b = 0, 1
    for i = 1:n-1
        a, b = b, a + b
    end
    a
end

function fibonnaci_closure()
    a, b = Ref(0), Ref(1)
    function ()
        al, bl = a[], b[]
        a[], b[] = bl, al + bl
        return al
    end
end

function test_closure()
    fib_closure = fibonnaci_closure()
    a = 0
    for i in 1:n
        a = fib_closure()
    end
    a
end
@resumable function fibonnaci_resumable()
  a = zero(Int)
  b = a + one(a)
  while true
    @yield a
    a, b = b, a + b
  end
end

function test_resumable()
    fib_resumable = fibonnaci_resumable()
    a = 0
    for i in 1:n
        a = fib_resumable()
    end
    a
end

@btime test_closure()
167.327 ns (3 allocations: 64 bytes)
@btime test_seq()
2.573 ns (0 allocations: 0 bytes)
@btime test_resumable()
292.989 ns (1 allocation: 32 bytes)

#49

I have tried to automate the closure approach before but without the ugly Ref hack, the performance is subpar:

using BenchmarkTools
using ResumableFunctions

const n = 93

@resumable function fibonnaci_resumable()
  a = zero(Int)
  b = a + one(a)
  while true
    @yield a
    a, b = b, a + b
  end
end

function test_resumable()
  fib_resumable = fibonnaci_resumable()
  for i in 1:n 
    fib_resumable() 
  end
end

println("ResumableFunctions: ")
@btime test_resumable()

function fibonnaci_closure()
  a = zero(Int)
  b = a + one(Int)
  function()
    tmp = a
    a, b = b, a + b
    tmp
  end
end

function test_closure()
  fib_closure = fibonnaci_closure()
  for i in 1:n 
    fib_closure() 
  end
end

println("Closure: ")
@btime test_closure()
ResumableFunctions:
  488.185 ns (1 allocation: 32 bytes)
Closure:
  8.839 μs (83 allocations: 1.31 KiB)

Another problem with closures is the use of variables that are not initialised at the start of the inner function but are needed to keep the state, e.g. an internal iterator:

using ResumableFunctions

const n = 93

@resumable function internal_iterator_resumable(n::Int)
  a = zero(Int)
  b = a + one(Int)
  for i in 1:n-1
    @yield a
    a, b = b, a + b
  end
  a
end
for val in internal_iterator_resumable(n)
  println(val)
end

I don’t see how I can make this work without initialising the iterator and its state before the inner function and using a while loop (not working example but shows how it can be done):

import Base.start, Base.next, Base.done

const n = 93

function internal_iterator_closure(n::Int)
  a = Ref(zero(Int))
  b = Ref(a[] + one(Int))
  iter = Ref(1:n-1)
  state = Ref(start(iter[]))
  function()
    while !done(iter[], state[])
      i, state[] = next(iter[], state[])
      tmp = a[]
      a[], b[] = b[], a[] + b[]
      return tmp
    end
    a[]
  end
end

start(iter) = iter.state[]
next(iter, state) = iter(), iter.state[]
done(iter, state) = done(iter.iter[], iter.state[])

for val in  internal_iterator_closure(n)
  println(val)
end

I don’t think this is feasible to do for all possible corner cases using a macro based approach. The approach I use for ResumableFunctions is a direct extension to the way closures are implemented: A closure is simply a callable object with field names corresponding to captured variables. Not only the captured variables have to be stored in the callable object but also the internal ones and a kind of dispatch mechanism needs to be inserted to do a non local jump to the right instruction… and this explains the difference in performance between optimised closures and ResumableFunctions.
Personally I find it quite amazing that ResumableFunctions runs a lot faster than a closure implementation without the Ref hack and is less than 2 times slower than closures with the Ref hack.
I would really appreciate if someone finds the bug in the last example. I am stuck and have other things to do;)


#50

This is a bit unfair:

julia> function test_seq()
           a, b = 0, 1
           for i = 1:n-1
               a, b = b, a + b
           end
           a
       end
test_seq (generic function with 1 method)

julia> @code_llvm test_seq()

define i64 @julia_test_seq_61059() #0 !dbg !5 {
top:
  ret i64 7540113804746346429
}

#51

Oh yeah of course… The ratio didn’t make sense at all! :wink:


#52

I could not resist to implement a closure based finite state-machine:

function fibonnaci_closure_stm()
  a = Ref(zero(Int))
  b = Ref(a[] + one(Int))
  _state = Ref(zero(UInt8))
  (_arg::Any=nothing)->begin
    _state[] == 0x00 && @goto _STATE_0
    _state[] == 0x01 && @goto _STATE_1
    error("@resumable function has stopped!")
    @label _STATE_0
    _state[] = 0xff
    _arg isa Exception && throw(_arg)
    while true
      _state[] = 0x01
      return a[]
      @label _STATE_1
      _state[] = 0xff
      _arg isa Exception && throw(_arg)
      a[], b[] = b[], a[] + b[]
    end
  end
end

function test_closure_stm()
  fib_closure = fibonnaci_closure_stm()
  for i in 1:n 
    fib_closure() 
  end
end

println("Closure statemachine: ")
@btime test_closure_stm()

The dispatch is identical to what ResumableFunctions generates automatically. So this is a fair comparison;)

ResumableFunctions:
  414.055 ns (1 allocation: 32 bytes)
Closure statemachine:
  445.116 ns (4 allocations: 80 bytes)

I could implement a macro automating the transformation and the Ref trick but I don’t see how I can implement the iterator interface for a specific closure object.


#53

… a fair direct implementation:

const n = 93

@noinline function fibonnaci_direct(a::Int, b::Int)
  b, a+b
end

function test_direct()
  a = zero(Int)
  b = a + one(a)
  for i in 1:n 
    a, b = fibonnaci_direct(a, b)
  end
end

println("Direct: ")
@btime test_direct()

Summary of all benchmarks:

Direct:
  154.587 ns (0 allocations: 0 bytes)
ResumableFunctions:
  414.040 ns (1 allocation: 32 bytes)
Produce/consume:
  37.495 μs (158 allocations: 4.06 KiB)
Channels csize=0:
  106.098 μs (276 allocations: 7.47 KiB)
Channels csize=20:
  23.371 μs (116 allocations: 5.88 KiB)
Channels csize=100:
  13.167 μs (108 allocations: 6.77 KiB)
Closure:
  6.966 μs (83 allocations: 1.31 KiB)
Closure optimised:
  190.277 ns (3 allocations: 64 bytes)
Closure statemachine:
  446.282 ns (4 allocations: 80 bytes)

#54

Maybe like this:


macro iterable(closure)
    quote
        f = $(esc(closure))
        Base.next(f::typeof(f), state::Int) = f(), state + 1
        Base.done(f::typeof(f), state::Int) = false
        Base.start(f::typeof(f)) = 1
        f
    end
end

function test()
    a, b = 0, 1
    @iterable function ()
        a, b = b, a + b
        return a
    end
end
cl = test()
for elem in cl
    println(elem)
    elem > 10^6 && break
end

Need to figure out a nice way to also specify a condition for done!


#55

You can implement done by including the state in the closure:

macro iterable(closure)
    quote
        f = $(esc(closure))
        Base.next(f::typeof(f), state::Int) = f(), state + 1
        Base.done(f::typeof(f), state::Int) = f.state[]
        Base.start(f::typeof(f)) = 1
        f
    end
end

function test(n)
    a, b = 0, 1
    i = 0
    state = Ref(false)
    @iterable function ()
        a, b = b, a + b
        i += 1
        i == n ? state[] = true : state[] = false
        return a
    end
end
cl = test(10)
for elem in cl
    println(elem)
end