# I implemented a type for defining arbitrary lazy sequences that performs well, but the implementation is so trivial that I'm not sure if it's worth making a package

… so I thought I’d post it here and see what people thought.

Lazy sequences have been a point of frustration for me with Julia. Python is my main background, and I always liked generator functions, and I really enjoy playing with lazy types in various functional languages.

In Julia, channels can be used in a way similar to generator functions, but the performance is bad. Thunk-based solutions like Lazy.jl are super cool, but performance there is also problematic. Making an efficient lazy iterator in Julia requires at least defining a type and an iterate method for it—as well as `Base.eltype` and `Base.IteratorSize` for most practical cases. This seems boiler-plate-y. Required infinite Fibonacci example:

``````struct Fibs end
Base.iterate(f::Fibs) = iterate(f, (0, 1))
Base.iterate(::Fibs, (a, b)) = a, (b, a+b)
Base.IteratorSize(::Fibs) = Base.IsInfinite()
Base.eltype(::Fibs) = Int

foreach(println, Fibs())
``````

This is actually fast!

My thought was to make a type that generalized this. This is what I came up with:

``````struct Sequence{F <: Function, I}
f::F
init::I
end

Base.iterate(s::Sequence) = iterate(s, s.init)
Base.iterate(s::Sequence, state) = s.f(state)
Base.IteratorSize(::Sequence) = Base.SizeUnknown()
``````

Now, your infinite fibonacci sequence looks like this:

``````fibs = Sequence((0, 1)) do (a, b)
a, (b, a+b)
end
``````

And it is just as fast as the other method.

If you want a sequence that terminates, it looks like this:

``````counter(stop) = Sequence(1) do n
n > stop ? nothing : (n, n+1)
end
``````

This works well, but it’s still tricky to “collect” because it lacks an implementation for `eltype`. You can either give a type explicitly to the collect function, or you can use something like this:

``````_get_types(s::Sequence) =
Base.return_types(s.f, (typeof(s.state),))[1] |> _get_types
_get_types(::Type{Union{Nothing, S}}) where S = _get_types(S)
_get_types(::Type{Tuple{El,State}}) where {El,State} = (el=El, state=State)
Base.eltype(s::Sequence) = _get_types(s).el
``````

Note that this is an imperfect heuristic. It can fail if the type of the state can change in such a way that will change the return types. This seems to be pretty rare in practice.

Is anyone else interested in this kind of thing? Would it be worth packaging it up and distributing it?

Edit: @tkf explained in a later post a better way to implement eltype:

``````Base.IteratorEltype(::Sequence) = Base.EltypeUnknown()
``````
7 Likes

I think so but it seems opening PR somewhere will be better. See this topic, it looks similar: Julia equivalence of Haskells unfoldr

2 Likes
1 Like

ResumableFunctions are also relatively slow–though much faster than using a Channel.

Is anyone else interested in this kind of thing? Would it be worth packaging it up and distributing it?

I’ve been using something like this quite often, but I never really thought of putting it into a module, since it’s really not a lot of code.

1 Like

super cool type, it is in the spirit of GitHub - francescoalemanno/KernelOps.jl: Apply lazy kernel operations on AbstractArrays, they are composable and non copying. Have Fun!

1 Like

I guess it’s more or less `IterTools.iterated` (with a post-hoc fix up using `imap`)?

I think the common consensus is using `return_type(s)` like this is wrong (i.e., changes in the compiler heuristics should not affect your API). I think a better approach is to define `Base.IteratorEltype` to be `Base.EltypeUnknown()`. `collect` should be able to handle it by the type-widening strategy.

6 Likes

Thanks for insight. Seems to work well!

I’ll check in with IterTools and see if they are interested in a type like this, as others have suggested.

Well come on then! Show us the benchmarks

2 Likes

Good idea. Here’s the benchmark code:

``````#!/usr/bin/env julia
using BenchmarkTools
using Lazy
using ResumableFunctions
using Sequence

channel_counter(start::I, stop::I) where I<:Integer =
Channel{I}(1) do chan
foreach(n->put!(chan, n), start:stop)
end

lazy_counter(start) = @lazy start : lazy_counter(start + 1)
lazy_counter(start, stop) =
takeuntil(n -> n > stop, lazy_counter(start))

@resumable function resumable_counter(start, stop)
for n in start:stop
@yield n
end
end

sequence_counter(start, stop) = Sequence.new(start) do n
n > stop ? nothing : (n, n+1)
end

struct CustomCounter{I <: Integer}
start::I
stop::I
end
Base.iterate(cc::CustomCounter) = Base.iterate(cc, cc.start)
Base.iterate(cc::CustomCounter, n) = n > cc.stop ? nothing : (n, n+1)

function main()
println(:channel)
@btime foldl(+, channel_counter(1, 100))
println(:lazy)
@btime foldl(+, lazy_counter(1, 100))
println(:resumable)
@btime foldl(+, resumable_counter(1, 100))
println(:sequence)
@btime foldl(+, sequence_counter(1, 100))
println(:custom)
@btime foldl(+, CustomCounter(1, 100))
end

main()
``````

Here is the output:

``````channel
388.460 μs (229 allocations: 4.92 KiB)
lazy
174.170 μs (1589 allocations: 32.85 KiB)
resumable
17.957 μs (373 allocations: 10.59 KiB)
sequence
2.210 ns (0 allocations: 0 bytes)
custom
2.210 ns (0 allocations: 0 bytes)
``````

So, it’s like 9,000 8,000 times faster than resumable functions and the same speed as implementing a custom type.

3 Likes

2 ns for a loop of 100 cannot be right - the compiler probably just precomputes the result and returns it. For me any number of iterations give the same timing, e.g.

``````using BenchmarkTools

struct CustomCounter{I <: Integer}
start::I
stop::I
end
Base.iterate(cc::CustomCounter) = Base.iterate(cc, cc.start)
Base.iterate(cc::CustomCounter, n) = n > cc.stop ? nothing : (n, n+1)

function main()
@btime foldl(+, CustomCounter(1, 1000000))
end

main()
``````

shows 1.2 ns. A way to more properly benchmark it is to put the number of steps into a variable:

``````function main()
println(:inlined_100)
@btime foldl(+, CustomCounter(1, 100))
println(:inlined_1000000)
@btime foldl(+, CustomCounter(1, 1000000))
println(:var_100)
n = 100
@btime foldl(+, CustomCounter(1, \$n))
println(:var_1000000)
n = 1000000
@btime foldl(+, CustomCounter(1, \$n))
end

main()
``````

prints

``````
inlined_100
1.267 ns (0 allocations: 0 bytes)
inlined_1000000
1.267 ns (0 allocations: 0 bytes)
var_100
31.864 ns (0 allocations: 0 bytes)
var_1000000
250.600 μs (0 allocations: 0 bytes)
``````
4 Likes

Actually, it’s probably using the closed form solution for a sum of sequential integers (if they’re non-negative, it can be computed efficiently with `(n * (n + 1)) >>> 1`).
LLVM often makes that transformation, and `@code_llvm` confirms.

``````function sumints(N)
s = zero(N)
for n ∈ one(N):N
s += n
end
s
end
``````

This yields:

``````#julia> @code_llvm debuginfo=:none sumints(100)

define i64 @julia_sumints_2688(i64) {
top:
%1 = icmp sgt i64 %0, 0
%2 = select i1 %1, i64 %0, i64 0
br i1 %1, label %L12.preheader, label %L27

%3 = shl nuw i64 %2, 1
%4 = add nsw i64 %2, -1
%5 = zext i64 %4 to i65
%6 = add nsw i64 %2, -2
%7 = zext i64 %6 to i65
%8 = mul i65 %5, %7
%9 = lshr i65 %8, 1
%10 = trunc i65 %9 to i64
%11 = add i64 %3, %10
%12 = add i64 %11, -1
br label %L27

L27:                                              ; preds = %L12.preheader, %top
%value_phi9 = phi i64 [ 0, %top ], [ %12, %L12.preheader ]
ret i64 %value_phi9
}
``````

This isn’t as clean as `(n * (n + 1)) >>> 1`, but you can see that it does not including any loops (`L12.preheader:` cannot jump back to itself, but only to the end, `br label %L27`), and it does include simple arithmetic and shifts.

That ninjaaron’s approach is generating simple enough code for LLVM to figure this out shows that it is pretty friendly to the compiler.

3 Likes

Thanks for pointing this out! I reran the benchmark with `start` and `stop` as variables in each case which definitely made an impact! Here is the `main` function after the changes:

``````function main()
start, stop = 1, 100
println(:channel)
@btime foldl(+, channel_counter(\$start, \$stop))
println(:lazy)
@btime foldl(+, lazy_counter(\$start, \$stop))
println(:resumable)
@btime foldl(+, resumable_counter(\$start, \$stop))
println(:sequence)
@btime foldl(+, sequence_counter(\$start, \$stop))
println(:custom)
@btime foldl(+, CustomCounter(\$start, \$stop))
end
``````

and here is the output:

``````channel
232.484 μs (229 allocations: 4.92 KiB)
lazy
109.011 μs (1590 allocations: 32.87 KiB)
resumable
11.329 μs (373 allocations: 10.59 KiB)
sequence
33.356 ns (0 allocations: 0 bytes)
custom
31.860 ns (0 allocations: 0 bytes)
``````

Interestingly, the slower benchmarks improved with this change and the faster benchmarks got slower. Now, Sequence only 340 times faster than ResumableFunctions.

While that is a huge change from before and what you suggested is definitely a better benchmark, I would point out the fact that Sequence is susceptible to this kind of optimization is also significant.

3 Likes

Where is the `Sequence` module defined?

I posted most of the code in the OP, but here’s exactly what it is in the benchmarks I posted—it incorporates the change to the implementation of `eltype` suggested by @tkf :

``````module Sequence

struct T{F <: Function, S}
f::F
state::S
end
new(f, s) = T(f, s)

Base.iterate(s::T) = iterate(s, s.state)
Base.iterate(s::T, state) = s.f(state)
Base.IteratorSize(::T) = Base.SizeUnknown()
Base.IteratorEltype(::T) = Base.EltypeUnknown()

end # module
``````

I also have some implementations of `map`, `filter`, `take`, `takewhile` and a few others, but they are unnecessary, since this composes fine with the functions in `Base.Iterators`.

Apologies in advance for naming the struct `T`. I’ve been writing a lot of OCaml, which has had an impact on my coding style. The upshot is that I got the idea for this when I noticed the similarities between the iteration protocol in Julia and the API of `Base.Sequence` in OCaml (from Jane Street).

4 Likes

Thanks, I was missing `new`. I’d have to look in this more to find out why, but the code is generating a loop for both `sequence_counter` and `CustomCounter`.

FYI, here is how you implement it as a “foldable” (a counter-part of “iterator” in Transducers.jl; see `Transducers.AdHocFoldable`):

``````using Transducers
using Transducers: @next, complete

counter_foldable(start, stop) =
i = start
acc = @next(rf, acc, i)
while i < stop
i += 1
acc = @next(rf, acc, i)
end
return complete(rf, acc)
end
``````

Benchmark code:

``````function main()
println(:var_100)
start = 1
stop = 100
@btime foldl(+, CustomCounter(\$start, \$stop))
@btime foldl(+, counter_foldable(\$start, \$stop))
println(:var_1000000)
stop = 1000000
@btime foldl(+, CustomCounter(\$start, \$stop))
@btime foldl(+, counter_foldable(\$start, \$stop))
end
``````

Output:

``````var_100
60.414 ns (0 allocations: 0 bytes)
2.284 ns (0 allocations: 0 bytes)
var_1000000
527.555 μs (0 allocations: 0 bytes)
2.285 ns (0 allocations: 0 bytes)
``````

As you can see, it looks like Julia+LLVM figured out closed-form solution even with the `\$` hack. I don’t know how to workaround this (I tried `start = Ref(1)` instead but it didn’t work).

Another example:

``````fib_foldable() =
a, b = 0, 1
while true
acc = @next(rf, acc, a)
a, b = (b, a + b)
end
return complete(rf, acc)
end

julia> collect(Take(10), fib_foldable())
10-element Array{Int64,1}:
0
1
1
2
3
5
8
13
21
34
``````

As you can see, you can kind of think `@next` as `yield` in Python.

3 Likes

Sequence (and presumably a type-based counter) seem to get into a similar optimization situation when implemented in a way that is type-stable:

``````_infinite_counter(start) = Sequence.new(n->(n, n+1), start)
type_stable_sequence_counter(start, stop) =
take(_infinite_counter(start), stop-(start-1))
``````

``````stop = 100
channel
232.768 μs (229 allocations: 4.92 KiB)
lazy
106.801 μs (1590 allocations: 32.87 KiB)
resumable
11.672 μs (373 allocations: 10.59 KiB)
sequence
33.358 ns (0 allocations: 0 bytes)
custom
31.859 ns (0 allocations: 0 bytes)
type_stable_sequence_counter
3.812 ns (0 allocations: 0 bytes)
1.901 ns (0 allocations: 0 bytes)

stop = 1,000,000
type_stable_sequence_counter
3.817 ns (0 allocations: 0 bytes)
It’s a bit strange that the iteration protocol is inherently non-type-stable when it has a terminal case (type is `Union{Nothing,Tuple{El,State}} where {El, State}`).
It could probably be improved—at least in terms of performance—by having an returning a type-stable tuple of `(el, state, completion_status)` where completion_status is a Boolean. I wonder…