Creating Generators

I think yield is a very good construct, good reasons they have bloomed here and there in so many languages.
IMHO they weren’t tackled enough seriously in Julia v1, that has so many other great features too.
After all, they have landed in C# 2 not 1.

I’d add one more to the list.
|> extensible

In this case they’ve made the state machine explicit. Iterator protocol has an implicit state machine. This goto/while/return construct can be extended to handle more complex state transitions.

I guess this is mostly a matter of getting used to some pattern, which then feels natural (to a point where one even forgets that one had to learn the concept in the first place).

I remember the first time I saw a Python generator using the yield statement: I did not know what it was doing, and had trouble to understand how it worked and why/when one would want to use it. After some time however, it felt natural. When I started using Julia, which had the same kind of concepts at the time, I felt I was not in uncharted territories in this respect.

Then newer versions of Julia started using Channels to define coroutines, and this is not as efficient for generators. It did put me off-balance for some time, but now I came to get used to writing iterate functions.

In practice, I find that composing existing iterators (coming from Base.Iterators or built with the special comprehension-like syntax for generators) goes a long way. Composing Transducers is now another option. I have come to think that writing custom iterate methods for more specific things is not so hard (although it requires being in a different state of mind than writing coroutines).

Do you have an example of a complex generator implemented with a goto/while/return construct? Not trying to argue, I’m genuinely interested if there’s a clean, readable and extensible goto-based solution to this that I’ve missed. Having a concrete example to discuss around would be helpful.

1 Like

I personally wouldn’t use this style so I can’t think of an example off the top of my head.

The goto let’s you jump back to the code ( state ) you returned from. The while is one of those state transitions that goes back to itself ( can have nested states here ). Here’s an example of an iterator with some explicit state machine stuff going on: Shawn Hargreaves Blog Index

You could make iterators as arbitrarily complex as you like by making your state data as complex as you like. Or you can push some of that complexity into logic ( e.g. goto/while/return ) instead.

Usually I would run into problems where I need to explicitly model a state machine in code when there is UI involved. This can end up being very modal ( nested state machines ).

I guess if you wanted to implement some kind of FRP style with iterators you could end up needing some goto’s: Functional reactive programming - Wikipedia

My choice in that case would be to use Channels. CSP Communicating sequential processes - Wikipedia is great, and I’d love to see a select/choose/alts ( everyone has their own name! ) function for Julia’s channels.

https://github.com/JuliaLang/julia/issues/13763

I guess an (entirely made-up) example of a generator with multiple yield statements could be something like this:

Version using yield

using ResumableFunctions

@resumable function myIter()
    i = 1
    while true
        if iseven(i)
            @yield i
        else
            for j in 1:i
                @yield -j
            end
        end
        i += 1
    end
end
julia> Iterators.take(myIter(), 10) |> collect
10-element Array{Any,1}:
 -1
  2
 -1
 -2
 -3
  4
 -1
 -2
 -3
 -4

Version using goto / while / return

struct MyIterGoto end

function Base.iterate(::MyIterGoto, (i, j, l) = (0,0,0))
    if     l == 1;  @goto l1
    elseif l == 2;  @goto l2 
    end

    i = 1
    while true
        if iseven(i)
            return i, (i,0,1)
            @label l1
        else
            j = 1
            while j <= i
                return -j, (i,j,2)
                @label l2
                j += 1
            end
        end
        i += 1
    end
end

Base.IteratorSize(::Type{MyIterGoto}) = Base.SizeUnknown()

“unrolled” version

struct MyIterUnrolled end

function Base.iterate(::MyIterUnrolled, (i,j) = (1,1))
    if iseven(i)
        return i, (i+1, 1)
    elseif j == i
        return -j, (i+1, 1)
    else
        return -j, (i, j+1)
    end
end

Base.IteratorSize(::Type{MyIterUnrolled}) = Base.SizeUnknown()

Although this version is very concise, I guess it is also arguably much harder to read and understand.




[EDIT] Transducer-based version

using Transducers

t = MapCat(i -> iseven(i) ? (i,) : -(1:i))

collect(t |> Take(10), Iterators.countfrom(1))

This version based on Transducers.jl is also very compact, while (IMHO) remaining easy to understand.

3 Likes

Key thing to notice is that the last version duplicates the -j logic. A function call here would still result in two call sites vs one. The state transition logic is also complected with the computation for the value of j.
The state machine logic is more explicit in the first two.

So, I’d argue that it’s a more extensible pattern because it isolates the new behaviour in one place. Well there are two more places for the transition logic, but that is structured nicely.

That seems easily fixed:

function Base.iterate(::MyIterUnrolled, (i,j) = (1,1))
    if iseven(i)
        value = i
        new_state = (i=i+1, j=1)
    else
        value = -j
        new_state = j < i ? (i=i, j=j+1) : (i=i+1, j=1)
    end
    return value, new_state
end

Right. So the thing I see now that is different is that the continuation versions don’t need to recheck the state condition iseven(i) to deterimine which state it is in. They are explicitly tracking which state it’s in and directly jump to it via the gotos at the top. You could easily fix that as well without the gotos by tracking l ( which state machine node we are in )

function Base.iterate(::MyIterUnrolled, (i,j,l) = (1,1,0))
    if l == 0
        ...
    elseif l == 1
    ....
    else # l == 2
end

Then you end up with multiple call sites for the -j mapping again. The continuation style seems less error prone, but that’s likely a personal preference.

[EDIT]
personal preference would be to use channels, which allow you to write very linear looking logic.

ijchan(n) = Channel(c-> map( i->iseven(i) ? put!(c,i) : map(j->put!(c,-j),1:i), 1:n))

Especially if you use do-blocks and unroll it over several lines instead of lambdas & ternary. Using map for code with side effects is frowned upon, I would definitely use a do-block and for-loops here.

jchan(n) = Channel(ctype=Int) do c  
                    for i = 1:n 
                       if iseven(i)  
                             put!(c,i) 
                       else
                          for j = 1:i 
                               put!(c,-j) 
                          end
                       end
                    end
             end

Which indeed, is exactly like the code with yield in Python, except it’s more powerful since you have the full power of coroutines if you want to do something more general. It’s slow by Julia standards, but very fast by Python standards (coroutine yield in Julia is generally faster than a plain function call in Python, let alone a generator call) so that’s not an issue when comparing with Python.

4 Likes

I separated a very small and lightweight generator package out of one of my projects – this just uses tasks, not channels:

https://github.com/zot/Generators.jl

Hopefully the package will get registered soon…

5 Likes

ResumableFunctions.jl is really good, some day it should enter the Base package. It’s also fast, much faster than the methods using tasks and channels. I’ve tried Python generators implemented in Numba, they’re very slow while rest of Numba code is good performance. So this is not easy. ResumableFunctions is close to direct access performance, according to their tests.

2 Likes

This post isn’t very recent, but it showed up in my suggestions here, and I’m very interested in the topic. I’m a big fan of do-syntax and the ability of creating a generator/iterator using it. Unfortunately there are some gotchas with that, Channel will give you asynchronous behavior, but you don’t necessarily want that. The interface is great, though. That’s one big reason I proposed this PR. Still not merged, but check it out: https://github.com/JuliaLang/julia/pull/44873