PyGen - python style generators

Interesting but quite technical presentation about coroutines in llvm: http://llvm.org/devmtg/2016-11/Slides/Nishanov-LLVMCoroutines.pdf

@BenLauwens please do make it an independent package

3 Likes

@nsmith, @JeffreySarnoff I have just finished a prerelease version of ResumableFunctions for Julia v0.6:

using ResumableFunctions

@resumable function fibonnaci(n::Int) :: Int
  a = 0
  b = 1
  for i in 1:n-1
    @yield a
    a, b = b, a+b
  end
  a
end

for v in fibonnaci(10)
  println(v)
end

Two way communication between the master and the slave function is possible:

@resumable function conversation()
  println("Who are you?")
  name = @yield
  println("Hello ", name, "!")
  @yield "I am a resumable function."
  println("Nice to meet you!")
end

two_way = conversation()
two_way()
println(two_way("Ben"))
two_way("Hi!")

Yielding inside a try-catch-finally-end clause is supported (only top level @yield statement):

@resumable function catch_me()
  try
    arg = @yield
    println(arg)
    arg = @yield
    println(arg)
    @yield 
  catch exc
    println(exc)
    arg = @yield
    println(arg)
  finally
    println("Cat: Next time ...!")
  end
end

struct MouseTooSlow <: Exception end

mouse = catch_me()
mouse()
mouse("Catch me if you can!")
mouse(MouseTooSlow())
mouse("I escaped!)

Comments and bug reports are welcome! I will start writing some formal documentation…
Have fun!

Ben

8 Likes

super

Great work @BenLauwens! Thanks for taking up the torch on this one. I’ll point users from my repository towards this one and am looking forward to browsing through the implementation.

A small benchmark to compare ResumableFunctions with produce/consume and channels

using BenchmarkTools
using ResumableFunctions

const n = 93

@resumable function fibonnaci_resumable()
  a = zero(Int)
  b = a + one(a)
  while true
    @yield a
    a, b = b, a + b
  end
end

function test_resumable()
  fib_resumable = fibonnaci_resumable()
  for i in 1:n 
    fib_resumable() 
  end
end

println("ResumableFunctions: ")
@btime test_resumable()

function my_produce(v)
  ct = current_task()
  consumer = ct.consumers
  ct.consumers = nothing
  Base.schedule_and_wait(consumer, v)
  return consumer.result
end

function my_consume(producer::Task, values...)
  istaskdone(producer) && return producer.value
  ct = current_task()
  ct.result = length(values)==1 ? values[1] : values
  producer.consumers = ct
  Base.schedule_and_wait(producer)
end

function fibonnaci_produce()
  a = zero(Int)
  b = a + one(a)
  while true
    my_produce(a)
    a, b = b, a + b
  end
end

function test_produce()
  fib_produce = @task fibonnaci_produce()
  for i in 1:n 
    my_consume(fib_produce) 
  end
end

println("Produce/consume: ")
@btime test_produce()

function fibonnaci_channel(ch::Channel)
  a = zero(Int)
  b = a + one(a)
  while true
    put!(ch, a)
    a, b = b, a + b
  end
end

function test_channel(csize::Int)
  fib_channel = Channel(fibonnaci_channel; ctype=Int, csize=csize)
  for i in 1:n 
    take!(fib_channel) 
  end
end

println("Channels csize=0: ")
@btime test_channel(0)

println("Channels csize=20: ")
@btime test_channel(20)

println("Channels csize=100: ")
@btime test_channel(100)

… and the results on my laptop

ResumableFunctions:
  416.533 ns (1 allocation: 32 bytes)
Produce/consume:
  36.599 μs (158 allocations: 4.06 KiB)
Channels csize=0:
  95.691 μs (276 allocations: 7.47 KiB)
Channels csize=20:
  17.731 μs (116 allocations: 5.88 KiB)
Channels csize=100:
  13.157 μs (108 allocations: 6.77 KiB)

Two almost trivial conclusions:

  • context switching is expensive compared to function calls;
  • performance of channels depends on the parameter csize

If this benchmark is biased or if channels can be used in a more performant way, all input is welcome!

3 Likes

Pretty amazing timings. You could also add a benchmark for a version that loops directly over the Fibonacci numbers, to see how much overhead there is behind the scenes of @resumable. I wonder whether something like this could end up in base one day. This would be a powerful way to express complex iteration patterns, especially nested ones, that can be sometimes hard to express otherwise.

I tried a loop and also a version with a closure. Due to #15276, the closure version isn’t as nice, though…


using BenchmarkTools, ResumableFunctions
const n = 93
function test_seq()
    a, b = 0, 1
    for i = 1:n-1
        a, b = b, a + b
    end
    a
end

function fibonnaci_closure()
    a, b = Ref(0), Ref(1)
    function ()
        al, bl = a[], b[]
        a[], b[] = bl, al + bl
        return al
    end
end

function test_closure()
    fib_closure = fibonnaci_closure()
    a = 0
    for i in 1:n
        a = fib_closure()
    end
    a
end
@resumable function fibonnaci_resumable()
  a = zero(Int)
  b = a + one(a)
  while true
    @yield a
    a, b = b, a + b
  end
end

function test_resumable()
    fib_resumable = fibonnaci_resumable()
    a = 0
    for i in 1:n
        a = fib_resumable()
    end
    a
end

@btime test_closure()
167.327 ns (3 allocations: 64 bytes)
@btime test_seq()
2.573 ns (0 allocations: 0 bytes)
@btime test_resumable()
292.989 ns (1 allocation: 32 bytes)
1 Like

I have tried to automate the closure approach before but without the ugly Ref hack, the performance is subpar:

using BenchmarkTools
using ResumableFunctions

const n = 93

@resumable function fibonnaci_resumable()
  a = zero(Int)
  b = a + one(a)
  while true
    @yield a
    a, b = b, a + b
  end
end

function test_resumable()
  fib_resumable = fibonnaci_resumable()
  for i in 1:n 
    fib_resumable() 
  end
end

println("ResumableFunctions: ")
@btime test_resumable()

function fibonnaci_closure()
  a = zero(Int)
  b = a + one(Int)
  function()
    tmp = a
    a, b = b, a + b
    tmp
  end
end

function test_closure()
  fib_closure = fibonnaci_closure()
  for i in 1:n 
    fib_closure() 
  end
end

println("Closure: ")
@btime test_closure()
ResumableFunctions:
  488.185 ns (1 allocation: 32 bytes)
Closure:
  8.839 μs (83 allocations: 1.31 KiB)

Another problem with closures is the use of variables that are not initialised at the start of the inner function but are needed to keep the state, e.g. an internal iterator:

using ResumableFunctions

const n = 93

@resumable function internal_iterator_resumable(n::Int)
  a = zero(Int)
  b = a + one(Int)
  for i in 1:n-1
    @yield a
    a, b = b, a + b
  end
  a
end
for val in internal_iterator_resumable(n)
  println(val)
end

I don’t see how I can make this work without initialising the iterator and its state before the inner function and using a while loop (not working example but shows how it can be done):

import Base.start, Base.next, Base.done

const n = 93

function internal_iterator_closure(n::Int)
  a = Ref(zero(Int))
  b = Ref(a[] + one(Int))
  iter = Ref(1:n-1)
  state = Ref(start(iter[]))
  function()
    while !done(iter[], state[])
      i, state[] = next(iter[], state[])
      tmp = a[]
      a[], b[] = b[], a[] + b[]
      return tmp
    end
    a[]
  end
end

start(iter) = iter.state[]
next(iter, state) = iter(), iter.state[]
done(iter, state) = done(iter.iter[], iter.state[])

for val in  internal_iterator_closure(n)
  println(val)
end

I don’t think this is feasible to do for all possible corner cases using a macro based approach. The approach I use for ResumableFunctions is a direct extension to the way closures are implemented: A closure is simply a callable object with field names corresponding to captured variables. Not only the captured variables have to be stored in the callable object but also the internal ones and a kind of dispatch mechanism needs to be inserted to do a non local jump to the right instruction… and this explains the difference in performance between optimised closures and ResumableFunctions.
Personally I find it quite amazing that ResumableFunctions runs a lot faster than a closure implementation without the Ref hack and is less than 2 times slower than closures with the Ref hack.
I would really appreciate if someone finds the bug in the last example. I am stuck and have other things to do;)

1 Like

This is a bit unfair:

julia> function test_seq()
           a, b = 0, 1
           for i = 1:n-1
               a, b = b, a + b
           end
           a
       end
test_seq (generic function with 1 method)

julia> @code_llvm test_seq()

define i64 @julia_test_seq_61059() #0 !dbg !5 {
top:
  ret i64 7540113804746346429
}

Oh yeah of course… The ratio didn’t make sense at all! :wink:

1 Like

I could not resist to implement a closure based finite state-machine:

function fibonnaci_closure_stm()
  a = Ref(zero(Int))
  b = Ref(a[] + one(Int))
  _state = Ref(zero(UInt8))
  (_arg::Any=nothing)->begin
    _state[] == 0x00 && @goto _STATE_0
    _state[] == 0x01 && @goto _STATE_1
    error("@resumable function has stopped!")
    @label _STATE_0
    _state[] = 0xff
    _arg isa Exception && throw(_arg)
    while true
      _state[] = 0x01
      return a[]
      @label _STATE_1
      _state[] = 0xff
      _arg isa Exception && throw(_arg)
      a[], b[] = b[], a[] + b[]
    end
  end
end

function test_closure_stm()
  fib_closure = fibonnaci_closure_stm()
  for i in 1:n 
    fib_closure() 
  end
end

println("Closure statemachine: ")
@btime test_closure_stm()

The dispatch is identical to what ResumableFunctions generates automatically. So this is a fair comparison;)

ResumableFunctions:
  414.055 ns (1 allocation: 32 bytes)
Closure statemachine:
  445.116 ns (4 allocations: 80 bytes)

I could implement a macro automating the transformation and the Ref trick but I don’t see how I can implement the iterator interface for a specific closure object.

… a fair direct implementation:

const n = 93

@noinline function fibonnaci_direct(a::Int, b::Int)
  b, a+b
end

function test_direct()
  a = zero(Int)
  b = a + one(a)
  for i in 1:n 
    a, b = fibonnaci_direct(a, b)
  end
end

println("Direct: ")
@btime test_direct()

Summary of all benchmarks:

Direct:
  154.587 ns (0 allocations: 0 bytes)
ResumableFunctions:
  414.040 ns (1 allocation: 32 bytes)
Produce/consume:
  37.495 μs (158 allocations: 4.06 KiB)
Channels csize=0:
  106.098 μs (276 allocations: 7.47 KiB)
Channels csize=20:
  23.371 μs (116 allocations: 5.88 KiB)
Channels csize=100:
  13.167 μs (108 allocations: 6.77 KiB)
Closure:
  6.966 μs (83 allocations: 1.31 KiB)
Closure optimised:
  190.277 ns (3 allocations: 64 bytes)
Closure statemachine:
  446.282 ns (4 allocations: 80 bytes)

Maybe like this:


macro iterable(closure)
    quote
        f = $(esc(closure))
        Base.next(f::typeof(f), state::Int) = f(), state + 1
        Base.done(f::typeof(f), state::Int) = false
        Base.start(f::typeof(f)) = 1
        f
    end
end

function test()
    a, b = 0, 1
    @iterable function ()
        a, b = b, a + b
        return a
    end
end
cl = test()
for elem in cl
    println(elem)
    elem > 10^6 && break
end

Need to figure out a nice way to also specify a condition for done!

2 Likes

You can implement done by including the state in the closure:

macro iterable(closure)
    quote
        f = $(esc(closure))
        Base.next(f::typeof(f), state::Int) = f(), state + 1
        Base.done(f::typeof(f), state::Int) = f.state[]
        Base.start(f::typeof(f)) = 1
        f
    end
end

function test(n)
    a, b = 0, 1
    i = 0
    state = Ref(false)
    @iterable function ()
        a, b = b, a + b
        i += 1
        i == n ? state[] = true : state[] = false
        return a
    end
end
cl = test(10)
for elem in cl
    println(elem)
end

@BenLauwens

Very Great work !

However i’am not so fond of solutions that shake the runtime at a too low level with some kind of magical macro on a general basis.

So i have handcoded my yield

struct PSerie
    p
end

function Base.iterate(ps::PSerie, itr=(0,1,0))
    s,n,y = itr
    if y == 1; @goto y1 end

    while true
        s += 1.0/(n^ps.p)
        return s,(s,n,1)
        @label y1
        n += 1
    end
end

And you can too

More info here: