Tranducers.jl's `Filter` is slow simple cases. When to use Transducers?

xiaodai · August 20, 2020, 7:05am

Transducers.jl have lots of functionality but I it’s not clear WHEN I use it as in when can I expect a speed up and what operations can be done in parallel.

See

using Transducers

x = rand(Int, 100_000_000)

f(x) = x |> Filter(iseven) |> collect

@benchmark f($x)

g(x) = filter(iseven, x)

@benchmark g($x)

tkf · August 20, 2020, 10:13pm

FYI, there is some improvements in the dev version [*1] of Transducers.jl that helps with this case:

Baseline:

julia> @benchmark g($x)
BenchmarkTools.Trial:
  memory estimate:  762.94 MiB
  allocs estimate:  3
  --------------
  minimum time:     409.426 ms (0.00% GC)
  median time:      419.465 ms (0.16% GC)
  mean time:        441.335 ms (6.12% GC)
  maximum time:     554.023 ms (24.38% GC)
  --------------
  samples:          12
  evals/sample:     1

Transducers v0.4.47:

julia> @benchmark f($x)
BenchmarkTools.Trial:
  memory estimate:  1.74 GiB
  allocs estimate:  49995535
  --------------
  minimum time:     1.880 s (11.18% GC)
  median time:      1.994 s (11.01% GC)
  mean time:        1.959 s (12.86% GC)
  maximum time:     2.003 s (16.29% GC)
  --------------
  samples:          3
  evals/sample:     1

Transducers 3a574548b664d00d0d20419bd2581a79fcd5a904:

julia> @benchmark f($x)
BenchmarkTools.Trial:
  memory estimate:  513.00 MiB
  allocs estimate:  26
  --------------
  minimum time:     877.074 ms (0.00% GC)
  median time:      914.099 ms (0.10% GC)
  mean time:        957.058 ms (2.67% GC)
  maximum time:     1.141 s (0.08% GC)
  --------------
  samples:          6
  evals/sample:     1

Of course, even this version of Transducers.jl is not as fast as the specialized eager filter in Base. There is one optimization [*2] I’ve been wanting to do in Transducers that might help for this particular situation. Not sure if I can squeeze out another 2x speedup but let’s see

But, as for the general the “WHEN to use” question, the answer is “it depends” especially if you want to have maximally efficient code. Eager specialized method like filter is easier to tune so it is tend to be fast. Also, if you know you want to do exactly what filter does, arguably, it’s simpler to just call one function filter than x |> Filter(f) |> collect. So why not use filter?

I certainly hope I can reply “just use Transducers.jl always.” But there are just so many optimizations and improvements I can think of! So, unfortunately, it’d take some time to reach that state.

[*1] But I’ve been agonizing how to release this version because it has a serious implication in the compilation latency, especially in Julia 1.5 (https://github.com/JuliaFolds/Transducers.jl/pull/413 / https://github.com/JuliaFolds/Transducers.jl/pull/412).

[*2] One of the slowdown in Transducers.jl version come from that each iteration has a bound check to decide if the output capacity is big enough. This can be avoided by chunking the input and doing one check per chunk. I built one of the basic facilities for this (the unsafe flag of BangBang.collector). I need to use it via the collect implementation by somehow detecting the transducers are non-expanding (with some traits).

Topic		Replies	Views
Transducers vs Iterators Performance iterators , transducers	0	413	May 12, 2023
ANN: Transducers.jl, efficient and composable algorithms for map- and reduce-like operations Package Announcements	24	5402	March 25, 2019
Could Transducers help here? Performance performance , transducers	4	983	January 23, 2019
Custom stateful transducer performance Performance performance	5	587	February 2, 2020
Thread- and process-based parallelisms in Transducers.jl (+ some news) Package Announcements parallel , multithreading , distributed	5	2085	January 15, 2020

Tranducers.jl's `Filter` is slow simple cases. When to use Transducers?

Related topics