I’m trying to get a finer understanding of where Transducers.jl can help, especially by comparison to explicit loops and iterators.
Here is a small example (freely adapted from advent of code, day 6, part2).
A version fully implemented using loops looks like this:
manhattan(a, b) = (b.-a) .|> abs |> sum
function fun(points, N)
count = 0
for x in 1:N, y in 1:N
s = 0
for p in points
s += manhattan((x,y), p)
end
if s < 50
count += 1
end
end
count
end
It does not allocate, and runs relatively fast:
julia> N = 1000;
julia> points = zip(rand(1:N, 50), rand(1:N, 50)) |> collect;
julia> @benchmark fun($points, $N)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 32.647 ms (0.00% GC)
median time: 33.373 ms (0.00% GC)
mean time: 33.620 ms (0.00% GC)
maximum time: 36.444 ms (0.00% GC)
--------------
samples: 149
evals/sample: 1
Here is a second version, using iterators and generators to replace the outer loop with a use of count
:
function fun_iter1(points, N)
count(Iterators.product(1:N, 1:N)) do p1
s = 0
for p2 in points
s += manhattan(p1, p2)
end
s < 50
end
end
The readability is (IMO) improved, and the performance drop is negligible: one allocation (probably to store the Iterators.product(...)
iterator:
julia> @benchmark fun_iter1($points, $N)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 33.272 ms (0.00% GC)
median time: 33.629 ms (0.00% GC)
mean time: 33.738 ms (0.00% GC)
maximum time: 35.302 ms (0.00% GC)
--------------
samples: 149
evals/sample: 1
Now, if I also want to replace the inner loop with a use of sum
, we get an implementation which is (IMO) even clearer:
function fun_iter2(points, N)
count(Iterators.product(1:N, 1:N)) do p1
sum(manhattan(p1, p2) for p2 in points) < 50
end
end
… but this time the performance noticeably degraded. It attribute this to the allocation of a (small) generator object use as the argument of sum
at each loop iteration:
julia> @benchmark fun_iter2($points, $N)
BenchmarkTools.Trial:
memory estimate: 30.52 MiB
allocs estimate: 1000001
--------------
minimum time: 49.331 ms (2.08% GC)
median time: 50.463 ms (2.68% GC)
mean time: 51.291 ms (4.34% GC)
maximum time: 95.284 ms (43.08% GC)
--------------
samples: 98
evals/sample: 1
So I was wondering: would this kind of algorithm would be a good candidate to use Transducers instead? But I did not manage to get any Transducer-based version which would perform better than this last iterator-based version.
(Of course, if anyone thinks of something to make the iterator-based version perform as well as the loop-only one, I’m also interested!)