ANN: Transducers.jl, efficient and composable algorithms for map- and reduce-like operations

Thanks! It’s good to know that transducer was on your radar :slight_smile:

Wow, this is really cool! It feels very Julian.

2 Likes

Sorry to ask for numbers, but I’m curious as to the advantages of Transducers, any bench-marks showing how much better this is than iterators?

To me the selling point of Transducers are composability and generality. I would not say that they are “faster” then iterators. They are not faster than a hand tuned for loop. But they avoid materializing intermediate results and thus they are faster then high level iterator code. Here is the example from @tkf’s SIMD test:

Edit: This snipped is misleading. I benchmark apples against oranges. See @tkf’s comment below.

julia> using Transducers, BenchmarkTools

julia> xs = randn(1000); ys = similar(xs);

julia> xf = Filter(x -> -0.5 < x < 0.5) |> Map(x -> 2x)
Filter(Main.λ❓) |>
    Map(Main.λ❓)

julia> @benchmark map!(xf,ys,xs)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     953.750 ns (0.00% GC)
  median time:      965.950 ns (0.00% GC)
  mean time:        977.053 ns (0.00% GC)
  maximum time:     1.890 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     20

julia> @benchmark map!(x -> 2x, ys, filter(x -> -0.5 < x < 0.5, xs))
BenchmarkTools.Trial: 
  memory estimate:  8.39 KiB
  allocs estimate:  9
  --------------
  minimum time:     4.883 μs (0.00% GC)
  median time:      5.311 μs (0.00% GC)
  mean time:        7.015 μs (16.79% GC)
  maximum time:     6.901 ms (99.78% GC)
  --------------
  samples:          10000
  evals/sample:     6
1 Like

Good point. Here is an example that is very “hostile” to iterator:

julia> xs = 1:2^15
1:32768

julia> @btime foldl(+, Iterators.flatten(1:x for x in xs))
  200.448 ms (10 allocations: 352 bytes)
5864598896640

julia> @btime mapfoldl(Map(x -> 1:x) |> Cat(), +, xs)
  45.532 μs (18 allocations: 880 bytes)
5864598896640

Piggybacking on Skipnothing is missing, here is a more reasonable case for comparison:

julia> xs = [abs(x) > 1 ? nothing : x for x in randn(2^20)];

julia> @btime foldl(+, Iterators.filter(!isnothing, xs); init=0.0)
  4.084 ms (4 allocations: 80 bytes)
779.0559461186579

julia> @btime mapfoldl(Filter(!isnothing), +, xs, init=0.0)
  2.913 ms (5 allocations: 80 bytes)
779.0559461186579
4 Likes

Thanks for bringing up that example. Let me note that the semantics of map! for transducers and iterators are bit different. Transducer Filter skips the destination elements as well while the iterator version “compress” all the output in a contiguous chunk. A function that does a similar thing in Transducers.jl is copy! (we may need copyto! which could be more efficient).

1 Like

@tkf cool! Do you have an explanation, why the skip nothing example if faster using Transducers?

I need to look at IR to be sure but some guesses:

  • foldl in Base actually does a lot more than what I’ve implemented in Transducers.jl, like detecting correct zero even when eltype(xs) is Union{Nothing, Float64}. This may give it some overhead to the iterator version (although I tried to be slightly fair by passing an init).

  • Maybe I put too many @inlines :slight_smile:

  • But it could also be a real effect; as I wrote in difference to iterators, the code “generated” by the iterator is a nested while loop with conditional break. Maybe Julia compiler can optimize structured for loop more easily? Although this means in the future there would probably be no difference in performance.

1 Like

I find something very bizarre: Transducer is actually faster than “equivalent” manual loop. I took sum_nonmissing from First-Class Statistical Missing Values Support in Julia 0.7 which noted:

replacing x !== missing by ismissing(x) in sum_nonmissing currently leads to a large performance drop

Indeed, sum_isnotmissing below is much slower than sum_nonmissing (as of 1.2.0-DEV.63).

Keno explained in the issue comment linked from the blog this is because:

=== is special in inference and inference happens before inlining, so even if they’re the same after inlining, Inference doesn’t know that.

However, using Filter(!ismissing) is just as fast as using x !== missing in the manual loop. If I look at @code_warntype output, I can see that output type is correctly inferred.

I’m not entirely sure what’s going on, but I’m guessing that transducers are compiler-friendly because they aggressively put function boundaries which may help inference.

using Transducers
using BenchmarkTools

# Taken from: https://julialang.org/blog/2018/06/missing
function sum_nonmissing(X::AbstractArray)
    s = zero(eltype(X))
    @inbounds @simd for x in X
        if x !== missing
            s += x
        end
    end
    s
end

function sum_isnotmissing(X::AbstractArray)
    s = zero(eltype(X))
    @inbounds @simd for x in X
        if !ismissing(x)
            s += x
        end
    end
    s
end

# Benchmark:

n = 2^10
xs = [abs(x) > 0.5 ? missing : x for x in randn(n)]

@btime sum_isnotmissing($xs)
# 2.908 μs (0 allocations: 0 bytes)
@btime sum_nonmissing($xs)
# 772.896 ns (0 allocations: 0 bytes)
@btime mapfoldl($(Filter(!ismissing)), +, $xs, init=0.0)
# 784.798 ns (0 allocations: 0 bytes)
VERSION
# v"1.2.0-DEV.63" (similar result with 1.0)
6 Likes

OK this is very cool, @tkf!

Just one question - it seems that Partition, Filter, Cat, Scan, etc behave like Functions - why is that they are piped into each other via |> (evaluate function with data operator) instead of composed with (function followed by another function operator)? Is it just to get the order of operations looking correct? (I’ve often wanted to have something like but writing the functions in the other order).

1 Like

Thanks for the comment!

First, just to clarify, I defined |>(::Transducer, ::Transducer) directly without making Transducer a callable (see below for why).

But this is a good question! And you already found the answer :slight_smile:

Yes, but just in case you haven’t gotten to the full explanation yet, it probably requires more words. A confusing (and interesting) part of transducers is that the flow of application is “flipped twice” (See also Glossary in my docs or the explanation in Clojure homepage).

A transducer is a function that maps a reducing function to a reducing function. A reducing function is the op argument you pass to normal foldl (e.g., + or *; a function of the form (Y, X) -> Y in general where Y is the “accumulator” and X is the “input”). Since transducer maps a function to a function, it is natural to define the composition by (and actually that’s how it works in Clojure). I could have used (and actually was using) Filter(isodd) ∘ Map(double) instead of Filter(isodd) |> Map(double). Note that, as evident when is used, the output (= a function) of the first transducer Filter(isodd) gets the input first. Maybe it becomes clear if you see how the application of this transducer to a reducing function (say) + works:

double(x) = 2x

# In hypothetical notation:
(Filter(isodd) |> Map(double))(+)(y, x)
== (Filter(isodd) ∘ Map(double))(+)(y, x)  # `|>` actually means `∘` for transducers
== (Filter(isodd)(Map(double)(+)))(y, x)   # expand `∘`
== if isodd(x)                             # expand `Filter`
     Map(double)(+)(y, x)
   else
     y
   end
== if isodd(x)
     (+)(y, double(x))                     # expand `Map`
   else
     y
   end

As you can see, input x is Filter’ed first and then Map’ed.

I thought it would be “doubly” confusing and it could be easier to start using transducers if I use some “arrow like” symbol that points in the direction that the data flows.

Another reason is to make the API accessible in ASCII (although I totally like coding in Unicode personally).

But at the same time, I’m not entirely sure that this is the right interface, as it changes the semantics of the operator |> for this particular type. Is it something frowned upon in Julia API design? It seems there are not much other examples that “violate” operator semantics.

FYI, I recently learned:

Some authors compose in the opposite order, writing fg or f ∘ g for g ∘ f. Computer scientists using category theory very commonly write f ; g for g ∘ fCategory theory - Wikipedia

(although sadly we can’t use it in Julia)

5 Likes

Transducers.jl is now registered :tada:. You can install it via ]add Transducers.

I also added a tutorial which shows how to use transducers while implementing various missing value handling.

16 Likes

The package documentation is really fantastic. Thanks for putting the work into it.

6 Likes

I should thank Documenter and Literate devs since those tools make writing docs super fun!

7 Likes

I coded up something similar - iterator processing functions that didn’t have the source bound in. I used the Julia nothing value to represent out-of-input and no-more-output states. I think this is closer to the Julia iteration protocol. This seems simpler than the early termination and completing process mentioned in the Rich Hickey video.

Interesting. Is it something like a framework for higher-order functions composing a function that maps an iterator to an iterator?

If you use iterator protocol or something similar, I suppose that’s still “pull”-based framework (i.e., the user of the iterator processing functions is the one driving the loop)? If so, my guess is that it has the same pros and cons of the iterators. For example, can it have the performance equivalent to the manual loops for flatten/Cat (iterating over vector-of-vector), zip, product, and BitArray? With transducers, you can do so simply because you can write foldl which can have for loops using whatever the best looping strategy is. Of course, the compiler may be able to re-construct such loops from Base.iterate. But that sounds hard. (But I don’t know much about compilers.)

I don’t know how does it work in your iterator processing functions, but I think termination for both iterators and transducers are cumbersome enough that you’d write a macro for it anyway (there is IterTools.@ifsomething). If you use macros, I don’t think there is much difference. But I’d say writing foldl is much easier than writing iterate. See this comparison: https://tkf.github.io/Transducers.jl/dev/examples/reducibles/

Another thought: I think the best “feature” of transducers for performance-oriented language like Julia is actually not transducers themselves but rather the foldl implementations specialized for container types. I guess that’s not news since Julia Base has very efficient foldl/mapfoldl. However, foldl (or mapfoldl) itself is not really composable — that’s where transducers come in. Transducers are composable “pre-processors” of the “reducing function” op you passed to foldl. I think one of the great observations by Rich Hickey is that this set of pre-processors can be as powerful as the usual iterator tool chain. But, as @arghhhh pointed out, it requires a generalization of foldl for supporting early termination and completion. I think it is worth doing so since it can be compiled away if you don’t use it.

6 Likes

First, thanks for writing this package! I read through your code, watched the video, and read through your code again and it is starting to make sense, mostly because you organized and documented everything so nicely.

I have some specific questions:

  1. My understanding is that the current implementation of Base.collect here uses type inference to infer the container type. Do you think it could use some strategy that has the same container type semantics as the default method, which widens on demand? Or does that not compose well with transducers?

  2. Similarly, did you run into the type explosion problem mentioned in the video (edit: corrected video link) in the Julia implementation?

  3. How should one go about implementing a custom container type that can accept output from transducer-mapped collections, what are the necessary primitives? Especially if a widening-on-demand strategy like collect is desired.

4 Likes

Thank you, I appreciate your feedback!

Yes, Transducers.jl’s collect uses inference API. Also, there are many other places (e.g., transducers like Partition which needs an internal buffer) that relies on type inference.

Great question! I’ve been pondering about how to avoid relying on the type inference. Note that, especially in transducers, this problem also applies when the output is not a container. For example, consider mapfoldl(Filter(isodd) |> Map(x -> x / 2), +, 1:10). It requires to choose a “good” initial zero unless you provide it with init keyword argument. Furthermore, the first resulting (mapped) item cannot be used as the initial guess since the first item may be filtered out. The problem you mention corresponds to the case you use push! instead of + (roughly speaking).

The direction I want to explore to solve it is to define “generic” left-identity type like

struct Identity end
Base.*(::Identity, x) = x
Base.+(::Identity, x) = x
# and so on...

and then use Identity() as the default init. (I am actually thinking to use struct Identity{OP} end with Base.*(::Identity{*}, x) = x to avoid confusion/bug due to miss-use of an identity for one reducing function with another reducing function. Also, above implementation like *(::Identity, x) can probably be implemented as a transducer.) It would require each foldl implementation to embrace widen-as-needed approach. This is because the accumulation variable can change type from Identity to, e.g., Float64 or Vector{Int} at any point. Also, this change of type can occur multiple times (e.g., IdentityVector{Int}Vector{Union{Missing,Int}}). It means that you need to write a somewhat non-trivial foldl for each container type (or any “reducible”). Although it is a downside in some sense, the upside is that you only need to write it once (well, if this plan works…).

Once this infrastructure is in place, I think widening-based collect can be implemented by just using

push_or_widen!(::Identity, x) = [x]
push_or_widen!(xs::Vector{T}, x::S) where {T, S <: T} = push!(xs, x)
push_or_widen!(xs::Vector, x) = push!(append!(Union{eltype(xs), typeof(x)}[], xs), x)

instead of push!. (Actually, you can already feed this function to mapfoldl. The only problem is that it is not going to be type-stable.)

I think Julia handles it nicely since it is a dynamic language. That is to say, the worst case scenario is the dynamic dispatch which should still result in a logically correct answer. One of the examples is the prime sieve. This is just a for-fun demo (don’t use it if you need speed!) but it does show that Julia can handle “type explosion.”

But I don’t know if it is possible to optimize such use-case unless the input container is a small container like tuples that can be completely unrolled.

For “sink” containers, I think it would be relatively easy as I think you only need to implement two methods:

push_or_widen!(xs::MyContainer{T}, x::S) where {T, S <: T} = push!(xs, x)
push_or_widen!(xs::MyContainer, x) = push!(append!(MyContainer{Union{eltype(xs), typeof(x)}}(), xs), x)

All the hard part would be in the foldl implementation for the “source” reducible container (as I mentioned in the answer to the first question).

6 Likes

Very interesting work! I’m especially curious about this as IndexedTables and StructArrays implement many tabular algorithms (groupreduce, groupby, filter, in the future join) as returning iterators which are then collected using collect_structarray which is a variant of collect in Base (it uses the same widening strategy) that collects an iterator of structs into a struct of arrays. The main drawback of this approach so far is that in some scenarios iterators lack performance: for example collect_structarray(flatten(itr)) performs quite poorly by default so we end up defining a collect_columns_flattened. I’d be happy to try and add a method to collect_structarray that works with transducers.

There are two concerns that I see with using transducers. First of all (as mentioned above), it’s important that collect doesn’t rely on type inference. What I don’t understand is whether the new solution you have in mind will have the same performance as the previous one in case the transducer being collected is actually type stable. Maybe the only thing that’s needed is to separate the first step of the foldl (until you get [x]) from the rest of the iteration with a function boundary (that’s StructArrays as well as Base.collect strategy AFAIU).

The second concern is that, while I think it’s natural to return an iterator, if the user calls the “lazy version” of join or groupby, it’d be a bit risky to return a transducer as the user would then be forced to use the transducers technology to do further work on it (which not everybody is familiar with, as it is very new). Even though it breaks all the nice composability, is it possible to somehow turn a (Transducer, Iterator) pair into a new iterator? This way the API could be that the user can get as an output of a tabular operation either the collected version (eager mode) or the (transducer, iterator) pair (advanced user that would know how to work with them) or simply an iterator (normal lazy mode).

3 Likes