Fixing the Piping/Chaining Issue

CameronBieganek · November 11, 2022, 12:04am

I think it’s a little different from that. A transducer is supposed to be a function from a reduction function to a reduction function. In other words, it should have a signature like this:

((acc, item) -> acc) -> ((acc, item) -> acc)

I was working on an example that I was hoping would demonstrate the advantage of transducers, but unfortunately Transducers.jl seems to perform quite poorly for this example. Here it is anyways:

If we compose a mapper and a filterer, the composed function will create an intermediate vector. However, I think in theory if you compose Map and Filter, there shouldn’t be an intermediate vector. Here’s the Base Julia composition approach:

mapper = xs -> map(x -> 2x, xs)
filterer = xs -> filter(>(100), xs)
f = filterer ∘ mapper

And here’s the Transducers.jl approach:

t = Map(x -> 2x) ⨟ Filter(>(100))

And here’s the benchmark where, unfortunately, Transducers.jl does quite poorly:

julia> @btime f(1:1000);
  1.124 μs (2 allocations: 15.88 KiB)

julia> @btime collect($t, 1:1000);
  10.261 μs (957 allocations: 51.61 KiB)

I could imagine Transducers.jl having more than 2 allocations if it is pushing the result into an initial empty vector, but 957 allocations is too many.

Mason · November 11, 2022, 12:09am

The problem here is collect. Generally, Transducers.jl performs very poorly with collect. What it really excels at is doing a reduction over the process. So compare these:

julia> xs = collect(1:100);

julia> @btime sum(Iterators.filter(>(100), Iterators.map(x -> 2x, $xs)))
  75.283 ns (0 allocations: 0 bytes)
7550

julia> @btime sum(filter(>(100), map(x -> 2x, $xs)))
  110.717 ns (2 allocations: 1.75 KiB)
7550

against

julia> using Transducers

julia> @btime sum($xs |> Map(x -> 2x) |> Filter(>(100)))
  33.505 ns (0 allocations: 0 bytes)
7550

collect could be improved with transducers, but doing so is pretty hard because Base.collect on iterators is pretty optimized.

CameronBieganek · November 11, 2022, 12:14am

Good to know. But do you have any idea why the collect(t, 1:1000) example has 957 allocations? Naively I would expect that collect would be pushing the results into an initial empty vector, which would result in ~6 allocations.

c42f · November 11, 2022, 12:37am

If I understand correctly, one of @tkf’s long term aims was to have Transducers (or generally, foldl-based implementations) as a target for lowing processing collections, but allow users to use normal piping syntax. This is exactly because the rules for composing transducers are “backward and confusing” to nearly everybody when they first see them. Again IIUC, there’s a duality between iterators and transducres so this makes some sense as a general goal.

One of the problems with collect is inferring the element type of the output container. IIRC iteration has special support in type inference and Base.collect uses Core.Compiler.return_type, so there’s been a lot of effort put into making that work. Transducers invert some of the control flow relative to iterators and it may be that inference just doesn’t work as well here. (Speculation based on some previous contact with these things, I may be out of date or wrong )

uniment · November 11, 2022, 1:35am

I knew using map and filter for a bunch of the examples would be a mistake

I originally used them for a bunch of the examples just to show how a backfix operator could be useful using functions that people are familiar with. But the point is to address the generic problem of partial functions, and to use that as a language-level solution to chaining and motivating autocomplete.

For fun though, let’s compare compile times for Mapping on a lambda vs. a partial application functor:


julia> const Fix2 = Fix{(2,)};

julia> @btime eval(:(  Map(x->x^2)  ))
  4.092 ms (2767 allocations: 161.78 KiB)
Map(Main.λ❓)

julia> @btime eval(:(  Map(Fix2(^,2))  ))
  57.700 μs (44 allocations: 2.14 KiB)
Map(Fix{typeof(^), (2,), 2, Tuple{Int64}, NamedTuple{(), Tuple{}}}(^, (2,), NamedTuple()))

Edit: there’s more nuance to measuring compile time, see below.

I enjoy the conversation about Transducers though. Very much out of my depth, something for me to learn more about.

CameronBieganek · November 11, 2022, 3:55am

It’s been a long time since I’ve looked at the docs for BenchmarkTools, but it seems to me like your benchmark is only measuring the time to construct the Map object. For measuring compilation times, the @time macro might be adequate, since it tells you the percent of the recorded time that was due to compilation:

t1 = Map(x->x^2)
t2 = Map(Fix{(2,)}(^, (2,)))

julia> @time collect(t1, 1:10);
  0.326903 seconds (1.20 M allocations: 65.878 MiB, 3.10% gc time, 99.97% compilation time)

julia> @time collect(t2, 1:10);
  0.127600 seconds (236.53 k allocations: 12.387 MiB, 7.50% gc time, 99.98% compilation time)

Note that these times are much higher than the times in your benchmark.

uniment · November 11, 2022, 5:16am

You’re right, I forgot to include the time taken to execute the function (and thus finish the compilation). Also, @btime has been making the Fix functors appear artificially good compared to lambda due to running multiple times:

Subsequent runs are faster after the first time

julia> @time eval(:(  Map(Fix2(*,2))([1,2,3])  ))
  0.036643 seconds (24.55 k allocations: 1.414 MiB, 93.90% compilation time)
3-element Vector{Int64} |>
    Map(Fix{typeof(*), (2,), 2, Tuple{Int64}, NamedTuple{(), Tuple{}}}(*, (2,), NamedTuple()))

julia> @time eval(:(  Map(Fix2(*,2))([1,2,3])  ))
  0.000170 seconds (102 allocations: 5.578 KiB)
3-element Vector{Int64} |>
    Map(Fix{typeof(*), (2,), 2, Tuple{Int64}, NamedTuple{(), Tuple{}}}(*, (2,), NamedTuple()))

julia> @time eval(:(  Map(Fix2(*,2))([1,2,3])  ))
  0.000207 seconds (102 allocations: 5.578 KiB)
3-element Vector{Int64} |>
    Map(Fix{typeof(*), (2,), 2, Tuple{Int64}, NamedTuple{(), Tuple{}}}(*, (2,), NamedTuple()))

julia> @btime eval(:(  Map(Fix2(*,2))([1,2,3])  ))
  82.500 μs (93 allocations: 5.23 KiB)
3-element Vector{Int64} |>
    Map(Fix{typeof(*), (2,), 2, Tuple{Int64}, NamedTuple{(), Tuple{}}}(*, (2,), NamedTuple()))

I don’t know why I thought quoting the expression would cause it to recompile from scratch. Brain fart.

Meanwhile, constructing on a lambda is consistently slow

julia> @time eval(:(  Map(x->2*x)([1,2,3])  ))
  0.024207 seconds (11.55 k allocations: 672.857 KiB, 92.00% compilation time)
3-element Vector{Int64} |>
    Map(Main.λ❓)

julia> @time eval(:(  Map(x->2*x)([1,2,3])  ))
  0.021667 seconds (11.55 k allocations: 672.857 KiB, 91.32% compilation time)
3-element Vector{Int64} |>
    Map(Main.λ❓)

julia> @time eval(:(  Map(x->2*x)([1,2,3])  ))
  0.035034 seconds (11.55 k allocations: 672.857 KiB, 86.82% compilation time)
3-element Vector{Int64} |>
    Map(Main.λ❓)

julia> @btime eval(:(  Map(x->2*x)([1,2,3])  ))
  18.818 ms (11541 allocations: 672.51 KiB)
3-element Vector{Int64} |>
    Map(Main.λ❓)

This is because every time, a new lambda is created from scratch and given a new type

julia> typeof(x->2*x)
var"#9489#9490"

julia> typeof(x->2*x)
var"#9491#9492"

julia> typeof(x->2*x)
var"#9493#9494"

julia> typeof(x->2*x)
var"#9495#9496"

I need to do benchmarking less cavalierly!

Less ambitious compile-time testing will just focus on creation of the functor object or lambda, use @time instead of @btime, and focus on only the first run calling a fresh function on fresh types:

`Fix` type

julia> @time FixFirst(+, 1.0)(2.0)
  0.000003 seconds
3.0

julia> @time FixFirst(+, 1.0)(2.0)
  0.000001 seconds
3.0

julia> @time FixFirst(+, 1.0)(2.0)
  0.000001 seconds
3.0

julia> @btime FixFirst(+, 1.0)(2.0)
  1.000 ns (0 allocations: 0 bytes)
3.0

julia>

julia> @time FixFirst(+, 1im)(2)
  0.000003 seconds
2 + 1im

julia> @time FixFirst(+, 1im)(2)
  0.000001 seconds
2 + 1im

julia> @time FixFirst(+, 1im)(2)
  0.000001 seconds
2 + 1im

julia> @btime FixFirst(+, 1im)(2)
  1.000 ns (0 allocations: 0 bytes)
2 + 1im

lambda

julia> @time (x->1.0+x)(2.0)
  0.002635 seconds (679 allocations: 41.471 KiB, 98.27% compilation time)
3.0

julia> @time (x->1.0+x)(2.0)
  0.002937 seconds (679 allocations: 41.471 KiB, 98.22% compilation time)
3.0

julia> @time (x->1.0+x)(2.0)
  0.002534 seconds (679 allocations: 41.471 KiB, 98.21% compilation time)
3.0

julia>

julia> @time (x->1im+x)(2)
  0.016275 seconds (3.41 k allocations: 196.206 KiB, 99.59% compilation time)
2 + 1im

julia> @time (x->1im+x)(2)
  0.005604 seconds (3.37 k allocations: 194.331 KiB, 98.65% compilation time)
2 + 1im

julia> @time (x->1im+x)(2)
  0.005470 seconds (3.37 k allocations: 194.253 KiB, 98.83% compilation time)
2 + 1im

julia> @btime (x->1im+x)(2) # of course after it's compiled it's fast
  1.500 ns (0 allocations: 0 bytes)
2 + 1im

~~The Fix functor is still better at compile-time than a lambda, by about three orders of magnitude. It’s just harder and less convenient to measure compile time with precision than I had hoped.~~ No it’s not, see below.

Is there a good way to “un-compile” a method for the sake of compile-time benchmarking?

uniment · November 11, 2022, 7:02am

So I'm revisiting this, and I thought the extra time was from compiling but it's not.

Sukera:

Literally the only reason the compiler was able to fold your computation was because the input to your Fix thing was part of the benchmarking expression. Here, let me help you out:

julia> @benchmark Fix{(1,2,-3,-1)}(f, vals, z=5)(args...; kw...) setup=(f=(args...;kwargs...)->(args...,(;kwargs...));vals=(:a,:b,:getting_close,:END); args=(:y, 1, 2); kw=(:k=>2,))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  52.667 μs … 176.066 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     53.412 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   55.087 μs ±   5.945 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▆█▅▂▁                 ▃                                      ▁
  █████▆▅█▆▆▇▆▄▅█▆▅▆▆▄▄██▇▅▅▇▆▆▇▆▅▅▆▅▅▆▅▅▅▄▄▄▄▄▂▄▄▄▄▄▅▂▃▄▄▅▅▅▆ █
  52.7 μs       Histogram: log(frequency) by time      84.9 μs <

 Memory estimate: 2.59 KiB, allocs estimate: 45.

julia> @btime Fix{(1,2,-3,-1)}(f, vals, z=5)(args...; kw...) setup=(f=(args...;kwargs...)->(args...,(;kwargs...));vals=(:a,:b,:getting_close,:END); args=(:y, 1, 2); kw=(:k=>2,))
  52.683 μs (45 allocations: 2.59 KiB)
(:a, :b, :y, 1, :getting_close, 2, :END, (k = 2, z = 5))

And ok, you might argue “but the function is fixed!”, then here, have this benchmark where only the input is not known and thus can’t be folded:

julia> @benchmark Fix{(1,2,-3,-1)}((args...;kwargs...)->(args...,(;kwargs...)), (:a,:b,:getting_close,:END), z=5)(args...; kw...) setup=(args=(:y, 1, 2); kw=(:k=>2,))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  51.705 μs … 203.229 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     52.434 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   54.838 μs ±   7.742 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ██▄▂  ▁        ▁  ▂▂  ▁                                      ▂
  ███████▇▇█▁▇█▆▆█▅▅██▇▆█████▇▇██▇▇▇▇▇▆▆▆▇▆▇▆▆▆▆▅▆▆▆▇▇▆▆▅▆▆▆▄▅ █
  51.7 μs       Histogram: log(frequency) by time        89 μs <

 Memory estimate: 2.55 KiB, allocs estimate: 44.


julia> @btime Fix{(1,2,-3,-1)}((args...;kwargs...)->(args...,(;kwargs...)), (:a,:b,:getting_close,:END), z=5)(args...; kw...) setup=(args=(:y, 1, 2); kw=(:k=>2,))
  51.934 μs (44 allocations: 2.55 KiB)
(:a, :b, :y, 1, :getting_close, 2, :END, (k = 2, z = 5))

It’s not better. The only case where it could be better is if everything is tuple, always, which is just not a realistic use case.

To me your reply just shows that you’re ignorant about what you’re actually defining and how that actually works. This is not convincing.

I’m not sure why, but when I run the exact same benchmark it’s quite a bit faster on my machine:

julia> @btime Fix{(1,2,-3,-1)}(f, vals, z=5)(args...; kw...) setup=(f=(args...;kwargs...)->(args...,(;kwargs...));vals=(:a,:b,:getting_close,:END); args=(:y, 1, 2); kw=(:k=>2,))
  753.913 ns (18 allocations: 1.06 KiB)
(:a, :b, :y, 1, :getting_close, 2, :END, (k = 2, z = 5))

Still, it could be faster. The setup argument kw=(:k=>2,) leads to type-instability (making namedtuples out of pairs, dicts, etc. is type-unstable). So let’s change that to kw=(k=2,):

julia> @btime Fix{(1,2,-3,-1)}(f, vals, z=5)(args...; kw...) setup=(f=(args...;kwargs...)->(args...,(;kwargs...));vals=(:a,:b,:getting_close,:END); args=(:y, 1, 2); kw=(k=2,))
  1.500 ns (0 allocations: 0 bytes)
(:a, :b, :y, 1, :getting_close, 2, :END, (k = 2, z = 5))

Ah, that’s better.

c42f · November 12, 2022, 1:08am

You can probably redefine the method again by evaling the same source code. You might want to put some part of that into the @benchmark ... setup=eval(some_code) phase depending on which part of compilation you’re trying to measure. (hmm, and there might need to be an invokelatest in the benchmark after that.)

It might be more reliable overall to just generate a really big source file with lots of lambdas inside it and see how long that takes to compile/execute. Sometimes I find it’s better to just “be dumb” with benchmarks and write the batching (N-repetitions) code yourself, then divide by N. This can help a lot when it’s unclear whether the benchmark is measuring the right thing.

uniment · November 13, 2022, 4:31pm

Per my usual, I’ve been overly optimistic about the compile-time performance of the partial application functors due to poor measurement technique. They have the same compile time as comparable lambdas.

Measuring the compile time has some gotchas. For example:

julia> @time ((f,x)->(a...)->f(x,a...))(*, 1)(2)
  0.016421 seconds (1.84 k allocations: 103.518 KiB, 98.97% compilation time)
2

julia> struct FixFirst{F,X} f::F; x::X end

julia> (f::FixFirst)(a...) = f.f(f.x,a...); @time FixFirst(*, 1)(2)
  0.000002 seconds
2

In this test, the FixFirst functor appears to construct and call instantly. What if we use eval?

julia> (f::FixFirst)(a...) = f.f(f.x,a...); quote @time FixFirst(*, 1)(2) end |> eval
  0.000008 seconds
2

No change. It looks like the partial application functor is far superior to a lambda: single-digit microseconds versus 16ms.

But, if we put the functor definition inside the quote …

julia> quote (f::FixFirst)(a...) = f.f(f.x,a...); @time FixFirst(*, 1)(2) end |> eval
  0.006961 seconds (2.18 k allocations: 129.516 KiB, 98.40% compilation time)
2

Oop! It suddenly takes much longer! Now, what if we re-run the struct definition?

julia> struct FixFirst{F,X} f::F; x::X end

julia> quote (f::FixFirst)(a...) = f.f(f.x,a...); @time FixFirst(*, 1)(2) end |> eval
  0.015734 seconds (5.47 k allocations: 320.235 KiB, 96.34% compilation time)
2

Whoops! When forcing both the constructor and the functor to recompile (and ensuring that the functor definition and constructor call are simultaneously evaled), it takes just as long as the comparable lambda. The similarity shouldn’t be a surprise if you read the section on closures, but it took a surprising amount of experimentation to make it show up. We can break it out by piece:

julia> struct FixFirst{F,X} f::F; x::X end

julia> quote (f::FixFirst)(a...) = f.f(f.x,a...); @time f=FixFirst(*, 1) end |> eval; @time f(2)
  0.009330 seconds (4.06 k allocations: 238.813 KiB, 93.76% compilation time)
  0.006834 seconds (2.21 k allocations: 131.328 KiB, 97.76% compilation time)
2

Forcing the constructor and functor to recompile every time can also be done by partially applying a lambda, as the lambda is a new function each time (and so requires new type specialization every time).

julia> f=(x,y)->x*y; f(1,2); @time FixFirst(f,1)(2)
  0.016133 seconds (7.19 k allocations: 430.585 KiB, 95.98% compilation time)
2

While there are still benefits to dedicated typed partial application functors (namely, much better performance when partially applying methods of argument types that have previously been partially applied, as well as having types that other methods can specialize on), they don’t have the benefit of compiling faster in all situations: worst being partial application of anonymous functions, which will require recompiling anew every time a new anonymous function is defined.

Unless…

What happens if we remove the type parameterizations and specialization on the struct and its functor? I define FixF like so:

julia> @nospecialize struct FixF f; x end

julia> @nospecialize (f::FixF)(a...) = f.f(f.x,a...)

julia> f=(x,y)->x+y; f(1,2); @time h=FixF(f,1); @time h(2)
  0.000003 seconds (3 allocations: 96 bytes)
  0.000014 seconds
3

julia> u=(x,y)->x*y; u(1.0,2.0); @time h=FixF(u,1.0); @time h(2.0)
  0.000012 seconds (2 allocations: 48 bytes)
  0.000023 seconds (2 allocations: 32 bytes)
2.0

julia> @btime FixF((x,y)->x+y, 1)
  1.000 ns (0 allocations: 0 bytes)
FixF(var"#4041#4042"(), 1)

julia> @btime FixF((x,y)->x+y, 1)(2)
  1.000 ns (0 allocations: 0 bytes)
3

Looks really good! Runs fast. Until you realize that it’s not type stable.

julia> @btime map(FixFirst(+,1), [1,2,3])
  68.731 ns (2 allocations: 160 bytes)
3-element Vector{Int64}:
 2
 3
 4

julia> @btime map(FixF(+,1), [1,2,3])
  504.945 ns (5 allocations: 240 bytes)
3-element Vector{Int64}:
 2
 3
 4

Yeah, that’s not what you want for a partial applicator. Nice to see though there’s a trade-off between compile time and run time.

All this suggests that, for the task of piping, a syntax transform is indeed preferable to avoid compiling things that needn’t be compiled as @dlakelan has been adamant about. The differences in requirements between partial application and piping are sufficiently different to warrant different computational approaches, even if the syntax is similar.

It does feel like there ought to be some sort of optimization that could be done to make partial applicators compile faster, knowing how simple their operation is.

Sukera · November 13, 2022, 6:45pm

Now the only thing missing is the third piece of the triangle of tradeoffs: space!

Fast runtime - Low space/memory usage - Fast compilation: pick 2.

uniment · November 14, 2022, 5:43pm

To minimize confusion (since I changed my mind!) and filter some of the signal from the noise, I’ve created a new thread; see here. Please refrain from continuing this conversation in this thread.

merlin · January 16, 2023, 5:45am

i want this. was using python to orchestrate data jobs and its just si darned convenient to press a ‘.’ and get a list of all the possible things.

Topic		Replies	Views
Fixing the Piping/Chaining/Partial Application Issue (Rev 2) Internals & Design proposal , piping , chaining , partial-evaluation , threading	40	4080	November 26, 2022
Summary of piping/chaining proposal? Internals & Design	23	3289	August 13, 2024
Fixing the Piping/Chaining Issue (Rev 3) Internals & Design multithreading , syntax , piping , chaining , threading	89	7992	April 5, 2024
Partial Application brackets without underscores Internals & Design proposal , currying , partial-evaluation	5	1648	November 21, 2022
Allowing the object.method(args...) syntax as an alias for method(object, args ...) Internals & Design question , design	216	12831	March 14, 2023

Fixing the Piping/Chaining Issue

Related topics