[ANN] FGenerators.jl: Python-like generator syntax for high-performance iteration

FGenerators.jl is a package for defining Transducers.jl-compatible iteration protocol (aka extended foldl) with a simple Python-like @yield-based syntax. Quoting FGenerators.jl README, here are a few examples for creating ad-hoc “generators”:

julia> using FGenerators

julia> @fgenerator function generate123()
           @yield 1
           @yield 2
           @yield 3
       end;

julia> collect(generate123())
3-element Array{Int64,1}:
 1
 2
 3

julia> sum(generate123())
6

julia> @fgenerator function organpipe(n::Integer)
           i = 0
           while i != n
               i += 1
               @yield i
           end
           while true
               i -= 1
               i == 0 && return
               @yield i
           end
       end;

julia> collect(organpipe(3))
5-element Array{Int64,1}:
 1
 2
 3
 2
 1

julia> @fgenerator function organpipe2(n)
           @yieldfrom 1:n
           @yieldfrom n-1:-1:1
       end;

julia> collect(organpipe2(2))
3-element Array{Int64,1}:
 1
 2
 1

You can use FLoops.jl (see also: ANN: Parallel `for` loops in FLoops.jl with composable and extensible fold-based API) to iterate over the items yielded from the “generator”

julia> using FLoops

julia> @floop for x in generate123()
           @show x
       end
x = 1
x = 2
x = 3

That is to say, FGenerators.jl is a framework to write loops except the loop body and FLoops.jl is a framework to fill out the loop body. Transducers.jl’s core provides the interface between them.

FGenerators.jl is a spin-off of GeneratorsX.jl that not only defines foldl but also iterate using IRTools.jl. I am releasing the easier half (=foldl) as FGenerators.jl since IRTools.jl is a big dependency and inherently not the most robust package due to the reliance on some julia internals.

Why not iterate?

Note that @fgenerator does not create the generator in the sense of Base.Generator. That is to say, it does not define iterate. The reason behind using a protocol other than iterate is that you have more control over the execution strategy of the loop (it is also discussed in GeneratorsX.jl README). For example, here is an implementation of LoopRecipes.simdeachindex (documentation) that yields SIMD.VecRanges in the main part of the loop and normal integers in the remainder

simdeachindex(xs) = simdeachindex(Val{pick_vector_width(eltype(xs))}(), xs)
simdeachindex(width::Val, xs) = SIMDEachIndex(width, firstindex(xs), lastindex(xs))

@fgenerator(foldable::SIMDEachIndex) do
    W = valof(foldable.width)
    i = foldable.firstindex
    n = foldable.lastindex - W
    lane = VecRange{W}(0)
    if i <= n
        @yield lane + i
        i += W
        while i <= n
            @yield lane + i
            i += W
        end
    end
    if i <= foldable.lastindex
        @yield i
        i += 1
        while i <= foldable.lastindex
            @yield i
            i += 1
        end
    end
end

This can be used to abstract out the handling of the main and remainder loop

julia> using LoopRecipes, FLoops

julia> @floop for i in simdeachindex(ones(10))
           @show i
       end;
i = VecRange{4}(1)
i = VecRange{4}(5)
i = 9
i = 10

For more practical example of this, see sparse-dense dot product (simddot(xs::SparseVector, ys)) example in LoopRecipes.simdstored documentation.

Unfortunately, it seems like Julia’s optimizer do not optimize the loops using something like simdeachindex if it were implemented using iterate. Such loops have interesting progressions in the type of accumulator and element. It’s probably that there is not enough mechanism in the compiler to extract out this structure (I think the gap between iterate and foldl can be filled if someone like Keno takes time to write more abstract interpretation code). But, even if the compiler becomes wise enough, I’d argue that the equivalent iterate definition is rather hard to come up because you need to construct the state machine by hand:

@inline function Base.iterate(foldable::SIMDEachIndex)
    W = valof(foldable.width)
    i = foldable.firstindex
    n = foldable.lastindex - W
    return iterate(foldable, (i, n))
end

@inline function Base.iterate(foldable::SIMDEachIndex, (i, n)::Tuple{Int,Int})
    W = valof(foldable.width)
    if i <= n
        return (VecRange{W}(i), (i + W, n))
    else
        return iterate(foldable, i)
    end
end

@inline function Base.iterate(foldable::SIMDEachIndex, i::Int)
    i > foldable.lastindex && return nothing
    return (i, i + 1)
end

Other yield syntax packages

For comparisons to other packages that provides “yield syntax” such as Continuables.jl (JuliaCon 2020 talk), ResumableFunctions.jl, PyGen, see the discussion in GeneratorsX.jl README.

25 Likes

Hi,

This is such a fantastic package - thank you for creating it.

I was successfully using it for some time, including earlier today, until a recent update to 1.7.2 and now receive the following issues:

errors

ERROR: LoadError: Method Error: no method matching length(::AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#281#27", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.Between, Int64, Int64, Int64}}}) Closest candidates are: length(!Matched::Union{Base.KeySet, Base.ValueIterator}) at abstractdict.jl:58 length(!Matched::Union{Tables.AbstractColumns, Tables.AbstractRow}) at C:\Users\me\.julia\packages\Tables\M26tI\src\Tables.jl:175 length(!Matched::Union{DataStructures.OrderedRobinDict, DataStructures.RobinDict}) at C:\Users\me\.julia\packages\DataStructures\vSp4s\src\ordered_robin_dict.jl:86 ... Stacktrace: [1] amount(xs::AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#281#27", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.Between, Int64, Int64, Int64}}}) @ SplittablesBase.Implementations ~\.julia\packages\SplittablesBase\gpREK\src\implementations.jl:65 [2] transduce_assoc(xform::Transducers.IdentityTransducer, step::Transducers.AdHocRF{Main.GetMetricService.QueryResolver.var"##oninit_function#361#81", typeof(identity), Main.GetMetricService.QueryResolver.var"##reducing_function#362#82"{Vector{String}, Arrow.Table, Dict{String, Vector{Tuple{Int64, Float64}}}, Arrow.Primitive{Union{Missing, Int64}, Vector{Int64}}}, typeof(identity), typeof(identity), Main.GetMetricService.QueryResolver.var"##combine_function#363#83"}, init::Transducers.InitOf{Transducers.DefaultInitOf}, coll0::AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#281#27", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.Between, Int64, Int64, Int64}}}; simd::Val{false}, basesize::Nothing, stoppable::Nothing, nestlevel::Nothing) @ Transducers ~\.julia\packages\Transducers\CnpYX\src\reduce.jl:108 [3] transduce(xf::Transducers.IdentityTransducer, rf::Function, init::Transducers.InitOf{Transducers.DefaultInitOf}, coll::AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#281#27", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.Between, Int64, Int64, Int64}}}, exc::ThreadedEx{NamedTuple{(:simd,), Tuple{Val{false}}}}) @ Transducers ~\.julia\packages\Transducers\CnpYX\src\executors.jl:152 [4] transduce(xf::Transducers.IdentityTransducer, rf::Function, init::Transducers.InitOf{Transducers.DefaultInitOf}, coll::AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#281#27", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.Between, Int64, Int64, Int64}}}, exc::PreferParallel{NamedTuple{(:simd,), Tuple{Val{false}}}}) @ Transducers ~\.julia\packages\Transducers\CnpYX\src\executors.jl:164 [5] _fold @ ~\.julia\packages\FLoops\3ZEuy\src\reduce.jl:851 [inlined] [6] _fold(rf::Transducers.AdHocRF{Main.GetMetricService.QueryResolver.var"##oninit_function#361#81", typeof(identity), Main.GetMetricService.QueryResolver.var"##reducing_function#362#82"{Vector{String}, Arrow.Table, Dict{String, Vector{Tuple{Int64, Float64}}}, Arrow.Primitive{Union{Missing, Int64}, Vector{Int64}}}, typeof(identity), typeof(identity), Main.GetMetricService.QueryResolver.var"##combine_function#363#83"}, coll::AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#281#27", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.Between, Int64, Int64, Int64}}}, #unused#::Nothing, simd::Val{false}) @ FLoops ~\.julia\packages\FLoops\3ZEuy\src\reduce.jl:849 [7] macro expansion @ ~\.julia\packages\FLoops\3ZEuy\src\reduce.jl:829 [inlined] [8] groupOverIndexSpace(column::String, colSet::Vector{String}, data::Arrow.Table, enumerable::AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#281#27", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.Between, Int64, Int64, Int64}}}) @ Main.GetMetricService.QueryResolver ~\source\repos\GetMetricEngineCore\src\QueryResolverIterable.jl:1983 [9] macro expansion @ ~\source\repos\GetMetricEngineCore\src\QueryResolverIterable.jl:1763 [inlined] [10] (::Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#323#53")(__rf__#257::Transducers.BottomRF{Transducers.SideEffect{Main.GetMetricService.QueryResolver.var"#2#5"{Set{Any}}}}, __acc__#258::Nothing, xs#324::NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.DataDominance, AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#281#27", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.Between, Int64, Int64, Int64}}}, Int64, Int64}}) @ Main.GetMetricService.QueryResolver ~\.julia\packages\FGenerators\N3bhz\src\FGenerators.jl:290 [11] __foldl__ @ ~\.julia\packages\Transducers\CnpYX\src\processes.jl:1339 [inlined] [12] #transduce#140 @ ~\.julia\packages\Transducers\CnpYX\src\processes.jl:523 [inlined] [13] transduce @ ~\.julia\packages\Transducers\CnpYX\src\processes.jl:520 [inlined] [14] transduce(xform::Transducers.IdentityTransducer, f::Transducers.SideEffect{Main.GetMetricService.QueryResolver.var"#2#5"{Set{Any}}}, init::Nothing, coll::AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#323#53", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.DataDominance, AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#281#27", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.Between, Int64, Int64, Int64}}}, Int64, Int64}}}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}) @ Transducers ~\.julia\packages\Transducers\CnpYX\src\processes.jl:506 [15] transduce(xform::Transducers.IdentityTransducer, f::Transducers.SideEffect{Main.GetMetricService.QueryResolver.var"#2#5"{Set{Any}}}, init::Nothing, coll::AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#323#53", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.DataDominance, AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#281#27", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.Between, Int64, Int64, Int64}}}, Int64, Int64}}}) @ Transducers ~\.julia\packages\Transducers\CnpYX\src\processes.jl:505 [16] foreach(eff::Main.GetMetricService.QueryResolver.var"#2#5"{Set{Any}}, reducible::AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#323#53", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.DataDominance, AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#281#27", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.Between, Int64, Int64, Int64}}}, Int64, Int64}}}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}) @ Transducers ~\.julia\packages\Transducers\CnpYX\src\processes.jl:1128 [17] foreach(eff::Main.GetMetricService.QueryResolver.var"#2#5"{Set{Any}}, reducible::AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#323#53", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.DataDominance, AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#281#27", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.Between, Int64, Int64, Int64}}}, Int64, Int64}}}) @ Transducers ~\.julia\packages\Transducers\CnpYX\src\processes.jl:1127 [18] macro expansion @ ~\source\repos\GetMetricEngineCore\src\QueryResolverIterable.jl:236 [inlined] [19] (::Main.GetMetricService.QueryResolver.var"##getData#foldl#257#3")(__rf__#257::Transducers.BottomRF{Transducers.SideEffect{Main.GetMetricService.GetMetricBase.var"#1#2"{Vector{Main.GetMetricService.GetMetricModels.ComputeSetCollection}}}}, __acc__#258::Nothing, xs#258::NamedTuple{(:queryModel,), Tuple{Main.GetMetricService.QueryModels.QueryDataModel}}) @ Main.GetMetricService.QueryResolver ~\.julia\packages\FGenerators\N3bhz\src\FGenerators.jl:290 [20] __foldl__ @ ~\.julia\packages\Transducers\CnpYX\src\processes.jl:1339 [inlined] [21] #transduce#140 @ ~\.julia\packages\Transducers\CnpYX\src\processes.jl:523 [inlined] [22] transduce @ ~\.julia\packages\Transducers\CnpYX\src\processes.jl:520 [inlined] [23] transduce(xform::Transducers.IdentityTransducer, f::Transducers.SideEffect{Main.GetMetricService.GetMetricBase.var"#1#2"{Vector{Main.GetMetricService.GetMetricModels.ComputeSetCollection}}}, init::Nothing, coll::AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getData#foldl#257#3", NamedTuple{(:queryModel,), Tuple{Main.GetMetricService.QueryModels.QueryDataModel}}}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}) @ Transducers ~\.julia\packages\Transducers\CnpYX\src\processes.jl:506 [24] transduce @ ~\.julia\packages\Transducers\CnpYX\src\processes.jl:505 [inlined] [25] #foreach#149 @ ~\.julia\packages\Transducers\CnpYX\src\processes.jl:1128 [inlined] [26] foreach @ ~\.julia\packages\Transducers\CnpYX\src\processes.jl:1127 [inlined] [27] getMetric(requestID::String, queryResultsSet::AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getData#foldl#257#3", NamedTuple{(:queryModel,), Tuple{Main.GetMetricService.QueryModels.QueryDataModel}}}, analyticsRequests::Vector{Main.GetMetricService.GetMetricRequests.GetMetricRequestModels.IAnalyticsRequest}, resultSetSchema::Vector{String}, analyticsColumns::Vector{String}, subsetSchema::Vector{String}) @ Main.GetMetricService.GetMetricBase ~\source\repos\GetMetricEngineCore\src\GetMetricBase.jl:64 [28] getMetric(getMetricRequest::Main.GetMetricService.GetMetricRequests.GetMetricRequestModels.GetMetricRequestEndPoint) @ Main.GetMetricService ~\source\repos\GetMetricEngineCore\src\GetMetricService.jl:90 [29] top-level scope @ show.jl:955 in expression starting at c:\Users\me\source\repos\GetMetricEngineCore\src\runtests.jl:287

the manifest for the project is (I’ve omitted non - relevant packages since I exceeded the character limit originally):

Manifest

`

[[BangBang]]
deps = [“Compat”, “ConstructionBase”, “Future”, “InitialValues”, “LinearAlgebra”, “Requires”, “Setfield”, “Tables”, “ZygoteRules”]
git-tree-sha1 = “d648adb5e01b77358511fb95ea2e4d384109fac9”
uuid = “198e06fe-97b7-11e9-32a5-e1d131e6ad66”
version = “0.3.35”

[[FGenerators]]
deps = [“AbstractYieldMacros”, “ContextualMacros”, “FLoopsBase”, “MacroTools”, “Transducers”]
git-tree-sha1 = “dd2f057dbda6ad499e82cd89489865c14452a5d0”
uuid = “4fd0377b-cfdc-4941-97f4-8d7ddbb8981e”
version = “0.1.4”

[[FLoops]]
deps = [“BangBang”, “Compat”, “FLoopsBase”, “InitialValues”, “JuliaVariables”, “MLStyle”, “Serialization”, “Setfield”, “Transducers”]
git-tree-sha1 = “4391d3ed58db9dc5a9883b23a0578316b4798b1f”
uuid = “cc61a311-1640-44b5-9fba-1b764f453329”
version = “0.2.0”

[[FLoopsBase]]
deps = [“ContextVariablesX”]
git-tree-sha1 = “656f7a6859be8673bf1f35da5670246b923964f7”
uuid = “b9860ae5-e623-471e-878b-f6a53c775ea6”
version = “0.1.1”

[[SplittablesBase]]
deps = [“Setfield”, “Test”]
git-tree-sha1 = “39c9f91521de844bad65049efd4f9223e7ed43f9”
uuid = “171d559e-b47b-412a-8079-5efa626c420e”
version = “0.1.14”

[[Transducers]]
deps = [“Adapt”, “ArgCheck”, “BangBang”, “Baselet”, “CompositionsBase”, “DefineSingletons”, “Distributed”, “InitialValues”, “Logging”, “Markdown”, “MicroCollections”, “Requires”, “Setfield”, “SplittablesBase”, “Tables”]
git-tree-sha1 = “1cda71cc967e3ef78aa2593319f6c7379376f752”
uuid = “28d57a85-8fef-5791-bfe6-a80928e7c999”
version = “0.4.72”

`

with Julia version 1.6.0 (where it used to work but now fails after running Pkg.update()), 1.6.5 (where it didnt) and 1.7.2 (where it also didn’t).

Having attempted and failed to write a length / amount required for the @fgenerator, I’m reaching out in case this is a version issue or something easily resolved / reported by others. Google searches haven’t yielded anything to suggest this is well reported

I’m at a bit of a loss as this was a great case of “it just works” originally so hoping it’s something straightforward.

Regards,

Pls mind the environment by using this discourse tip.

1 Like

Thanks for the tip!

1 Like

Looking at the stacktrace, I’m going to guess that this is due to FLoops.jl’s change in 0.2: https://github.com/JuliaFolds/FLoops.jl#update-notes. By default @floop for ... is (mostly) always parallel. However, FGenerators.jl is not parallelizable by default.

You can use SequentialEx() as in @floop SequentialEx() for ... for FLoops.jl 0.1 behavior to avoid the error.

1 Like

Hi,

Thank you very much for reviewing the error and responding so quickly - that did indeed work in the sense that results are now being returned again!

It has made me think that to extend this to work with the parallel form might be a bit of a challenge given the error message. Would you advise writing this without the use of the @fgenerator macro and instead creating via the Transducer library proper? I did try a few combinations to try and mitigate the length error but no luck as this syntax from the error is a bit confusing:

length(::AdHocFoldable{Main.GetMetricService.QueryResolver.var"##getQueryIndexes#foldl#281#27", NamedTuple{(:tabularSource, :IQueryModel, :enumerable, :MinVal, :MaxVal), Tuple{Main.GetMetricService.QueryResolver.DataSetConverterModels.TabularSource, Main.GetMetricService.QueryModels.Between, Int64, Int64, Int64}}})

Anyway - thanks again - this package (and your Transducer ecosystem) is great and performant. I only had one challenge in composing multiple @fgenerators as per my link in the original post but worked around that via an initialiser of missing for the innermost loop:

function composeFGenerators(fgenerators)
initialIterator = missing

for fgenerator in fgenerators
   initialIterator =  fgenerator(initialIterator)
end
return initialIterator
end

#used like this 

@floop for r in (composeFGenerators(fGenerators))
    @yield r
end  

Regards,

See https://github.com/JuliaFolds/FGenerators.jl#defining-parallelizable-collection for how to make @fgenerator parallelizable.

@fgenerator is a very simple macro for generating (sequential) Transducers.jl API. There’s only a limited set of usage patterns where writing “raw” Transducers.jl API makes sense. For parallelization, you just have to add a couple of methods as mentioned in the link above.

1 Like

Thanks - I’ll give it a go!

Regards