Fixing the Piping/Chaining Issue

Per my usual, I’ve been overly optimistic about the compile-time performance of the partial application functors due to poor measurement technique. They have the same compile time as comparable lambdas.

Measuring the compile time has some gotchas. For example:

julia> @time ((f,x)->(a...)->f(x,a...))(*, 1)(2)
  0.016421 seconds (1.84 k allocations: 103.518 KiB, 98.97% compilation time)
2

julia> struct FixFirst{F,X} f::F; x::X end

julia> (f::FixFirst)(a...) = f.f(f.x,a...); @time FixFirst(*, 1)(2)
  0.000002 seconds
2

In this test, the FixFirst functor appears to construct and call instantly. What if we use eval?

julia> (f::FixFirst)(a...) = f.f(f.x,a...); quote @time FixFirst(*, 1)(2) end |> eval
  0.000008 seconds
2

No change. It looks like the partial application functor is far superior to a lambda: single-digit microseconds versus 16ms.

But, if we put the functor definition inside the quote …

julia> quote (f::FixFirst)(a...) = f.f(f.x,a...); @time FixFirst(*, 1)(2) end |> eval
  0.006961 seconds (2.18 k allocations: 129.516 KiB, 98.40% compilation time)
2

Oop! It suddenly takes much longer! Now, what if we re-run the struct definition?

julia> struct FixFirst{F,X} f::F; x::X end

julia> quote (f::FixFirst)(a...) = f.f(f.x,a...); @time FixFirst(*, 1)(2) end |> eval
  0.015734 seconds (5.47 k allocations: 320.235 KiB, 96.34% compilation time)
2

Whoops! When forcing both the constructor and the functor to recompile (and ensuring that the functor definition and constructor call are simultaneously evaled), it takes just as long as the comparable lambda. The similarity shouldn’t be a surprise if you read the section on closures, but it took a surprising amount of experimentation to make it show up. We can break it out by piece:

julia> struct FixFirst{F,X} f::F; x::X end

julia> quote (f::FixFirst)(a...) = f.f(f.x,a...); @time f=FixFirst(*, 1) end |> eval; @time f(2)
  0.009330 seconds (4.06 k allocations: 238.813 KiB, 93.76% compilation time)
  0.006834 seconds (2.21 k allocations: 131.328 KiB, 97.76% compilation time)
2

Forcing the constructor and functor to recompile every time can also be done by partially applying a lambda, as the lambda is a new function each time (and so requires new type specialization every time).

julia> f=(x,y)->x*y; f(1,2); @time FixFirst(f,1)(2)
  0.016133 seconds (7.19 k allocations: 430.585 KiB, 95.98% compilation time)
2

While there are still benefits to dedicated typed partial application functors (namely, much better performance when partially applying methods of argument types that have previously been partially applied, as well as having types that other methods can specialize on), they don’t have the benefit of compiling faster in all situations: worst being partial application of anonymous functions, which will require recompiling anew every time a new anonymous function is defined.

Unless…

What happens if we remove the type parameterizations and specialization on the struct and its functor? I define FixF like so:

julia> @nospecialize struct FixF f; x end

julia> @nospecialize (f::FixF)(a...) = f.f(f.x,a...)

julia> f=(x,y)->x+y; f(1,2); @time h=FixF(f,1); @time h(2)
  0.000003 seconds (3 allocations: 96 bytes)
  0.000014 seconds
3

julia> u=(x,y)->x*y; u(1.0,2.0); @time h=FixF(u,1.0); @time h(2.0)
  0.000012 seconds (2 allocations: 48 bytes)
  0.000023 seconds (2 allocations: 32 bytes)
2.0

julia> @btime FixF((x,y)->x+y, 1)
  1.000 ns (0 allocations: 0 bytes)
FixF(var"#4041#4042"(), 1)

julia> @btime FixF((x,y)->x+y, 1)(2)
  1.000 ns (0 allocations: 0 bytes)
3

Looks really good! Runs fast. Until you realize that it’s not type stable.

julia> @btime map(FixFirst(+,1), [1,2,3])
  68.731 ns (2 allocations: 160 bytes)
3-element Vector{Int64}:
 2
 3
 4

julia> @btime map(FixF(+,1), [1,2,3])
  504.945 ns (5 allocations: 240 bytes)
3-element Vector{Int64}:
 2
 3
 4

Yeah, that’s not what you want for a partial applicator. Nice to see though there’s a trade-off between compile time and run time.

All this suggests that, for the task of piping, a syntax transform is indeed preferable to avoid compiling things that needn’t be compiled as @dlakelan has been adamant about. The differences in requirements between partial application and piping are sufficiently different to warrant different computational approaches, even if the syntax is similar.

It does feel like there ought to be some sort of optimization that could be done to make partial applicators compile faster, knowing how simple their operation is.

2 Likes