Performance of closures

I compiled the latest julia v0.6-dev with and without #define COPY_STACKS and I got following results.
without #define COPY_STACKS:
yieldto
0.001367 seconds (10.11 k allocations: 1.168 MiB)
schedule_and_wait
0.001486 seconds (10.01 k allocations: 1.161 MiB)
channel
0.053305 seconds (60.03 k allocations: 2.841 MiB)
statemachine
0.000055 seconds (5 allocations: 192 bytes)

with #define COPY_STACKS:
yieldto
0.002546 seconds (10.11 k allocations: 164.750 KiB)
schedule_and_wait
0.002505 seconds (10.01 k allocations: 157.781 KiB)
channel
0.059846 seconds (60.03 k allocations: 1.835 MiB)
statemachine
0.000056 seconds (5 allocations: 192 bytes)

We can observe a trade-off between runtime and memory usage. But the results are still very different from the finite-state machine implementation. I suppose that there is no such thing as a free lunch: symmetric coroutines have a certain cost … and Channels with no buffer (to have a complete synchronisation between the Tasks) are very expensive.