An experiment for Python Style (but Unidirectional) Generators for Julia

complyue · February 14, 2022, 6:36pm

I find it’s fairly trivial to implement Python style generator functions for Julia, though only unidirectional stream production.

I use yieldto but doc says “Its use is discouraged.”, how idiomatic my approach is? I’m fairly new to Julia, eagerly learning.

Surface syntax:

julia> @generator function g(n::Int)
         m = n * 3
         @yield m
         m += 7
         @yield m
         m -= 9
         @yield m
       end
g (generic function with 1 method)

julia> for i in g(5)
         println(i)
       end
15
22
13

julia> collect(g(5))
3-element Vector{Int64}:
 15
 22
 13

julia>

Implementation source code here:
https://github.com/complyue/PyStyleButUnidirGenerators.jl/blob/main/src/PyStyleButUnidirGenerators.jl

Mason · February 14, 2022, 7:59pm

I think the style developed in FGenerators.jl combined with FLoops.jl is a more performant and well founded way to go about this, rather than spawning tasks and trying to properly manage the yielding. But it relies heavily on transducers, so that is a lot of infrastructure to digest and learn.

Here’s an example where I believe your use of yieldto fails:

julia> using .PyStyleButUnidirGenerators

julia> @generator function organpipe(n::Integer)
           i = 0
           while i != n
               i += 1
               @yield i
           end
           while true
               i -= 1
               i == 0 && return
               @yield i
           end
       end;

julia> collect(organpipe(2))

I left this on my computer for about 5 minutes and it never completed. I assumer there is some sort of task deadlock. I think if you want to do this with tasks, you’re better off making a channel, and then put!ing data into that channel at each yield, rather than trying to manage the task switching yourself.

Here’s how FGenerators.jl performs with that organpipe:

julia> using FGenerators

julia> @fgenerator function organpipe(n::Integer)
           i = 0
           while i != n
               i += 1
               @yield i
           end
           while true
               i -= 1
               i == 0 && return
               @yield i
           end
       end;

julia> let n = Ref(100)
           @btime sum(organpipe($n[]))
           @btime collect(organpipe($n[]))
       end;
  13.867 ns (0 allocations: 0 bytes)
  1.456 μs (204 allocations: 13.80 KiB)

complyue · February 15, 2022, 10:32am

Nice to see so many approaches have been attempted (the See also section of GeneratorsX.jl lists many of them)! Also parallelism and solo performance are tackled well.

I found a bug with your test case, it’s not related to the usage of yieldto, but inappropriate handling of early-return. It’s fixed as https://github.com/complyue/PyStyleButUnidirGenerators.jl/commit/a425289c95c79e77f443650f19a53ac4bdd1ae7c

It works after the fix:

julia> using PyStyleButUnidirGenerators

julia> @generator function organpipe(n::Integer)
         i = 0
         while i != n
           i += 1
           @yield i
         end
         while true
           i -= 1
           i == 0 && return
           @yield i
         end
       end
organpipe (generic function with 1 method)

julia> collect(organpipe(2))

3-element Vector{Int64}:
 1
 2
 1

julia>

For cases not concerning HPC (parallelism included), I still favor my cooperative-scheduling based implementation over Channel based ones (all in the low-performance camp), I see Channel comm is somewhat over demanding compared to cooperative-scheduling. I.e. as like the Python generators, the producing and consuming are interleaved and always synchronous, with cooperative-scheduling, no cost of mutex or memory barrier/fence would incur, unless the procedures perform async comm meanwhile (which can be nicely opted straight forward).

This somehow aligns to C++'s philosophy that “Pay only for what you use”.

I’m surprised none of those attempts addresses Python generator’s backward communication facility, i.e. .send() method of a generator call instance. Or I missed something?

Though that’s of limited usefulness, and Julia’s yieldto() works almost the same - just record your caller Task as a yield target (my impl. just leveraged that).

I feel Julia’s macro system can give even better ergonomics in use cases of that, but don’t have a particular one in my head atm.

tkf · February 15, 2022, 11:47am

Julia’s Task is very different from “stackless” coroutines/generators of Python/C++/Rust/etc. In Julia, a genuine call stack is allocated when starting a Task. Furthermore, the optimizer does not reason about the code across multiple tasks. As such, yieldto-based implementation of the generator (at least currently) will have significant non-optimizable overheads.

Coroutines and generators are interesting programming devices and it’d be nice to see more uses in Julia. For example, it’s useful for writing composable parsing tools. But I think there’re not many uses of yieldto in the Julia ecosystem because of the overhead of yieldto and people tend to be crazy about performance.

complyue · February 15, 2022, 1:23pm

Nice to know!

I guess each Task has its own dedicated native stack, so it runs at native machine speed until the next yieldto or other yield points, this is pretty lovable as with Julia

And with majority of high-performance parts well optimized, I suggest once you need to yield, there usually be some not-quite-performance-friendly situations to handle, e.g. needs to expand the capacity of some buffer or series storage backed by some database, then some slowdown could be relative forgivable in such cases, as long as full machine speed can be frictionlessly resumed after the situation settled. I see Julia being superior in achieving such a goal.

In handling low frequent situations, especially complex (w.r.t. business rules etc.) ones, ergonomics might be more valuable than run-speed, as code maintenance and other software-engineering endeavors could be rather more costing, over marginally less-time-to-run as in sense of business values for a return.

I’ve been longing to code in Julia since years ago, but can only embark until recently. Loving Julia as always, but tbh there’s a little pity that I feel it kinda being a DSL for high-machine-performance programming domain, I’d expect esc, gensym, fieldcount and alikes to only live in Meta and require explicit citation to use, but they are exposed from Base. Personally I’d regard them pollution to the conception space of business domain, when I deliver business functionalities to analytic team of my org, if as in Julia API. Analysts should learn Julia for sure, but bloated interfaces and tools with performance-optimization focus are, not welcoming to citizen developers consisting of many non-programmers but deeply involved in the computer-powered business.

I’m experimenting with composable grammar (syntax+semantics+pragmatics) components based on Julia, and feeling delighted. Hopefully I can share something in the near future.

ToucheSir · February 15, 2022, 3:44pm

Have you seen https://github.com/BenLauwens/ResumableFunctions.jl? AFAIK it’s the spiritually closest equivalent to Python-style generators.

complyue · February 15, 2022, 4:46pm

Yes, I later went over it indirectly through links in @Mason 's reply.

I’m aware its Channel based, and seemingly neither support communication back into the generator from outside. Also the limitations listed in its Caveats section of readme:

In a try block only top level @yield statements are allowed.
In a finally block a @yield statement is not allowed.
An anonymous function can not contain a @yield statement.

None of those exists in my cooperative-scheduling implementation, I suspect.

ToucheSir · February 15, 2022, 4:57pm

I’m pretty sure it’s not channel-based? The readme explicitly calls out channels as slower for this use case and even benchmarks against them. The same benchmarks also include a task-based implementation. WRT calling back into the generator, that’s supported via Manual · ResumableFunctions.

complyue · February 16, 2022, 8:47am

Ah, I was careless and misread its readme.

The benchmarks have good coverage, nice insights!

The two-way communication supported there is quite like Python’s .send() semantics too.

But with Julia’s extra macro-fu, I’d imagine something more ergonomic could get en-sugar-ed, like:

@generator function questionnaire()
  name = @yield "Who are your?"
  from = @yield "Where are you from?"
  age = @yield "How old are you?"

  return "Hello, $name from $(from)!\n" *
         "You are born in $(year(today())-age)."
end

Then:

@consume questionnaire() do q
  if q === "Who are your?"
    @feedback "John"
  elseif q === "Where are you from?"
    @feedback "Chicago"
  elseif q === "How old are you?"
    @feedback 21
  end
end

Which evaluates to "Hello John from Chicago!\nYou are born in 2001."

This use case is too fictional to be useful in real world cases, but seems interesting and maybe someone (including future me) can find a really useful scenario.

Topic		Replies	Views
PyGen - python style generators Community package , announcement	55	8826	January 16, 2019
Yield in iterator General Usage	5	1054	November 28, 2023
Approach to iterative functions in julia? New to Julia question	3	330	April 1, 2024
[ANN] FGenerators.jl: Python-like generator syntax for high-performance iteration Package Announcements loops	7	1903	February 19, 2022
From generator (with Channel) to iterator General Usage	6	2595	April 24, 2018

An experiment for Python Style (but Unidirectional) Generators for Julia

Related topics