Interesting but quite technical presentation about coroutines in llvm: http://llvm.org/devmtg/2016-11/Slides/Nishanov-LLVMCoroutines.pdf
@BenLauwens please do make it an independent package
@nsmith, @JeffreySarnoff I have just finished a prerelease version of ResumableFunctions for Julia v0.6:
using ResumableFunctions
@resumable function fibonnaci(n::Int) :: Int
a = 0
b = 1
for i in 1:n-1
@yield a
a, b = b, a+b
end
a
end
for v in fibonnaci(10)
println(v)
end
Two way communication between the master and the slave function is possible:
@resumable function conversation()
println("Who are you?")
name = @yield
println("Hello ", name, "!")
@yield "I am a resumable function."
println("Nice to meet you!")
end
two_way = conversation()
two_way()
println(two_way("Ben"))
two_way("Hi!")
Yielding inside a try
-catch
-finally
-end
clause is supported (only top level @yield statement):
@resumable function catch_me()
try
arg = @yield
println(arg)
arg = @yield
println(arg)
@yield
catch exc
println(exc)
arg = @yield
println(arg)
finally
println("Cat: Next time ...!")
end
end
struct MouseTooSlow <: Exception end
mouse = catch_me()
mouse()
mouse("Catch me if you can!")
mouse(MouseTooSlow())
mouse("I escaped!)
Comments and bug reports are welcome! I will start writing some formal documentation…
Have fun!
Ben
super
Great work @BenLauwens! Thanks for taking up the torch on this one. I’ll point users from my repository towards this one and am looking forward to browsing through the implementation.
A small benchmark to compare ResumableFunctions
with produce
/consume
and channels
using BenchmarkTools
using ResumableFunctions
const n = 93
@resumable function fibonnaci_resumable()
a = zero(Int)
b = a + one(a)
while true
@yield a
a, b = b, a + b
end
end
function test_resumable()
fib_resumable = fibonnaci_resumable()
for i in 1:n
fib_resumable()
end
end
println("ResumableFunctions: ")
@btime test_resumable()
function my_produce(v)
ct = current_task()
consumer = ct.consumers
ct.consumers = nothing
Base.schedule_and_wait(consumer, v)
return consumer.result
end
function my_consume(producer::Task, values...)
istaskdone(producer) && return producer.value
ct = current_task()
ct.result = length(values)==1 ? values[1] : values
producer.consumers = ct
Base.schedule_and_wait(producer)
end
function fibonnaci_produce()
a = zero(Int)
b = a + one(a)
while true
my_produce(a)
a, b = b, a + b
end
end
function test_produce()
fib_produce = @task fibonnaci_produce()
for i in 1:n
my_consume(fib_produce)
end
end
println("Produce/consume: ")
@btime test_produce()
function fibonnaci_channel(ch::Channel)
a = zero(Int)
b = a + one(a)
while true
put!(ch, a)
a, b = b, a + b
end
end
function test_channel(csize::Int)
fib_channel = Channel(fibonnaci_channel; ctype=Int, csize=csize)
for i in 1:n
take!(fib_channel)
end
end
println("Channels csize=0: ")
@btime test_channel(0)
println("Channels csize=20: ")
@btime test_channel(20)
println("Channels csize=100: ")
@btime test_channel(100)
… and the results on my laptop
ResumableFunctions:
416.533 ns (1 allocation: 32 bytes)
Produce/consume:
36.599 μs (158 allocations: 4.06 KiB)
Channels csize=0:
95.691 μs (276 allocations: 7.47 KiB)
Channels csize=20:
17.731 μs (116 allocations: 5.88 KiB)
Channels csize=100:
13.157 μs (108 allocations: 6.77 KiB)
Two almost trivial conclusions:
- context switching is expensive compared to function calls;
- performance of
channels
depends on the parametercsize
If this benchmark is biased or if channels
can be used in a more performant way, all input is welcome!
Pretty amazing timings. You could also add a benchmark for a version that loops directly over the Fibonacci numbers, to see how much overhead there is behind the scenes of @resumable. I wonder whether something like this could end up in base one day. This would be a powerful way to express complex iteration patterns, especially nested ones, that can be sometimes hard to express otherwise.
I tried a loop and also a version with a closure. Due to #15276, the closure version isn’t as nice, though…
using BenchmarkTools, ResumableFunctions
const n = 93
function test_seq()
a, b = 0, 1
for i = 1:n-1
a, b = b, a + b
end
a
end
function fibonnaci_closure()
a, b = Ref(0), Ref(1)
function ()
al, bl = a[], b[]
a[], b[] = bl, al + bl
return al
end
end
function test_closure()
fib_closure = fibonnaci_closure()
a = 0
for i in 1:n
a = fib_closure()
end
a
end
@resumable function fibonnaci_resumable()
a = zero(Int)
b = a + one(a)
while true
@yield a
a, b = b, a + b
end
end
function test_resumable()
fib_resumable = fibonnaci_resumable()
a = 0
for i in 1:n
a = fib_resumable()
end
a
end
@btime test_closure()
167.327 ns (3 allocations: 64 bytes)
@btime test_seq()
2.573 ns (0 allocations: 0 bytes)
@btime test_resumable()
292.989 ns (1 allocation: 32 bytes)
I have tried to automate the closure approach before but without the ugly Ref
hack, the performance is subpar:
using BenchmarkTools
using ResumableFunctions
const n = 93
@resumable function fibonnaci_resumable()
a = zero(Int)
b = a + one(a)
while true
@yield a
a, b = b, a + b
end
end
function test_resumable()
fib_resumable = fibonnaci_resumable()
for i in 1:n
fib_resumable()
end
end
println("ResumableFunctions: ")
@btime test_resumable()
function fibonnaci_closure()
a = zero(Int)
b = a + one(Int)
function()
tmp = a
a, b = b, a + b
tmp
end
end
function test_closure()
fib_closure = fibonnaci_closure()
for i in 1:n
fib_closure()
end
end
println("Closure: ")
@btime test_closure()
ResumableFunctions:
488.185 ns (1 allocation: 32 bytes)
Closure:
8.839 μs (83 allocations: 1.31 KiB)
Another problem with closures is the use of variables that are not initialised at the start of the inner function but are needed to keep the state, e.g. an internal iterator:
using ResumableFunctions
const n = 93
@resumable function internal_iterator_resumable(n::Int)
a = zero(Int)
b = a + one(Int)
for i in 1:n-1
@yield a
a, b = b, a + b
end
a
end
for val in internal_iterator_resumable(n)
println(val)
end
I don’t see how I can make this work without initialising the iterator and its state before the inner function and using a while loop (not working example but shows how it can be done):
import Base.start, Base.next, Base.done
const n = 93
function internal_iterator_closure(n::Int)
a = Ref(zero(Int))
b = Ref(a[] + one(Int))
iter = Ref(1:n-1)
state = Ref(start(iter[]))
function()
while !done(iter[], state[])
i, state[] = next(iter[], state[])
tmp = a[]
a[], b[] = b[], a[] + b[]
return tmp
end
a[]
end
end
start(iter) = iter.state[]
next(iter, state) = iter(), iter.state[]
done(iter, state) = done(iter.iter[], iter.state[])
for val in internal_iterator_closure(n)
println(val)
end
I don’t think this is feasible to do for all possible corner cases using a macro based approach. The approach I use for ResumableFunctions
is a direct extension to the way closures are implemented: A closure is simply a callable object with field names corresponding to captured variables. Not only the captured variables have to be stored in the callable object but also the internal ones and a kind of dispatch mechanism needs to be inserted to do a non local jump to the right instruction… and this explains the difference in performance between optimised closures and ResumableFunctions
.
Personally I find it quite amazing that ResumableFunctions
runs a lot faster than a closure implementation without the Ref
hack and is less than 2 times slower than closures with the Ref
hack.
I would really appreciate if someone finds the bug in the last example. I am stuck and have other things to do;)
This is a bit unfair:
julia> function test_seq()
a, b = 0, 1
for i = 1:n-1
a, b = b, a + b
end
a
end
test_seq (generic function with 1 method)
julia> @code_llvm test_seq()
define i64 @julia_test_seq_61059() #0 !dbg !5 {
top:
ret i64 7540113804746346429
}
Oh yeah of course… The ratio didn’t make sense at all!
I could not resist to implement a closure based finite state-machine:
function fibonnaci_closure_stm()
a = Ref(zero(Int))
b = Ref(a[] + one(Int))
_state = Ref(zero(UInt8))
(_arg::Any=nothing)->begin
_state[] == 0x00 && @goto _STATE_0
_state[] == 0x01 && @goto _STATE_1
error("@resumable function has stopped!")
@label _STATE_0
_state[] = 0xff
_arg isa Exception && throw(_arg)
while true
_state[] = 0x01
return a[]
@label _STATE_1
_state[] = 0xff
_arg isa Exception && throw(_arg)
a[], b[] = b[], a[] + b[]
end
end
end
function test_closure_stm()
fib_closure = fibonnaci_closure_stm()
for i in 1:n
fib_closure()
end
end
println("Closure statemachine: ")
@btime test_closure_stm()
The dispatch is identical to what ResumableFunctions
generates automatically. So this is a fair comparison;)
ResumableFunctions:
414.055 ns (1 allocation: 32 bytes)
Closure statemachine:
445.116 ns (4 allocations: 80 bytes)
I could implement a macro automating the transformation and the Ref trick but I don’t see how I can implement the iterator interface for a specific closure object.
… a fair direct implementation:
const n = 93
@noinline function fibonnaci_direct(a::Int, b::Int)
b, a+b
end
function test_direct()
a = zero(Int)
b = a + one(a)
for i in 1:n
a, b = fibonnaci_direct(a, b)
end
end
println("Direct: ")
@btime test_direct()
Summary of all benchmarks:
Direct:
154.587 ns (0 allocations: 0 bytes)
ResumableFunctions:
414.040 ns (1 allocation: 32 bytes)
Produce/consume:
37.495 μs (158 allocations: 4.06 KiB)
Channels csize=0:
106.098 μs (276 allocations: 7.47 KiB)
Channels csize=20:
23.371 μs (116 allocations: 5.88 KiB)
Channels csize=100:
13.167 μs (108 allocations: 6.77 KiB)
Closure:
6.966 μs (83 allocations: 1.31 KiB)
Closure optimised:
190.277 ns (3 allocations: 64 bytes)
Closure statemachine:
446.282 ns (4 allocations: 80 bytes)
Maybe like this:
macro iterable(closure)
quote
f = $(esc(closure))
Base.next(f::typeof(f), state::Int) = f(), state + 1
Base.done(f::typeof(f), state::Int) = false
Base.start(f::typeof(f)) = 1
f
end
end
function test()
a, b = 0, 1
@iterable function ()
a, b = b, a + b
return a
end
end
cl = test()
for elem in cl
println(elem)
elem > 10^6 && break
end
Need to figure out a nice way to also specify a condition for done!
You can implement done
by including the state in the closure:
macro iterable(closure)
quote
f = $(esc(closure))
Base.next(f::typeof(f), state::Int) = f(), state + 1
Base.done(f::typeof(f), state::Int) = f.state[]
Base.start(f::typeof(f)) = 1
f
end
end
function test(n)
a, b = 0, 1
i = 0
state = Ref(false)
@iterable function ()
a, b = b, a + b
i += 1
i == n ? state[] = true : state[] = false
return a
end
end
cl = test(10)
for elem in cl
println(elem)
end
Very Great work !
However i’am not so fond of solutions that shake the runtime at a too low level with some kind of magical macro on a general basis.
So i have handcoded my yield
struct PSerie
p
end
function Base.iterate(ps::PSerie, itr=(0,1,0))
s,n,y = itr
if y == 1; @goto y1 end
while true
s += 1.0/(n^ps.p)
return s,(s,n,1)
@label y1
n += 1
end
end
And you can too
More info here: