Fixing the Piping/Chaining Issue

I’ve listed a few downsides to creating functions on the fly as a form of piping.

  1. Hard to reason about
  2. Could change behavior (definitely changes scoping)
  3. Might cause slower interactive use

You are free to continue advocating for a Fix-style function creation. What the costs and benefits are for each method is up to you.

Meanwhile, the downsides to not creating functions on the fly are

  1. Creating dedicated syntax that happens to be perfect for creating partial functions, but not using it to create partial functions.

If there’s enough benefit to it, it could be interesting to add functionality to, say, the do keyword to force a syntax transformation when chaining:


[1, 2, 3] do filter(isodd,_); map(_^2, _); sum; sqrt end

# Acts like

[1, 2, 3] |> filter(isodd,_) |> map(_^2,_) |> sum |> sqrt

# But through syntax transformation instead of function creation

so we could get partial functions and have it compile fast when we don’t want them.

There is no reason you can’t use the syntax to create partial functions too.

@foo a |> b(c,_,d)

could evaluate to a value, while

@foo b(c,_,d)

evaluates to a curried function

1 Like

I apologize if I’m being obtuse, but is a couple hundred milliseconds unacceptable for interactive use?

And I don’t know the package, but from a quick glance it seems like the call to clean(df) is creating three anonymous functions. The time it takes to do such a thing doesn’t seem to dominate (assuming this is a legitimate way to determine compile time, I need validation):


julia> @btime eval(:(  θ -> cos(θ)+im*sin(θ)  ))
  442.700 μs (191 allocations: 10.39 KiB)
#113347 (generic function with 1 method)

julia> @btime eval(:(  (θ -> cos(θ)+im*sin(θ))(1.)  ))
  3.351 ms (7790 allocations: 435.34 KiB)
0.5403023058681398 + 0.8414709848078965im

Using this approach, it does appear that creating a Fix object does require extra time compared with simply calling the function it partially evaluates, but not as much as creating an anonymous function:

julia> @btime eval(:(  w(x, y) = x + y  ))
  177.800 μs (137 allocations: 7.39 KiB)
w (generic function with 1 method)

julia> @btime eval(:(  w(1, 2)  ))
  46.800 μs (35 allocations: 1.77 KiB)
3

julia> @btime eval(:(  Base.Fix1(w,1)(2)  ))
  68.900 μs (49 allocations: 2.42 KiB)
3

julia> @btime eval(:(  FixFirst(w,1)(2)  ))
  57.000 μs (42 allocations: 2.11 KiB)
3

julia> @btime eval(:(  Fix{(1,)}(w,(1,))(2)  ))
  86.900 μs (68 allocations: 3.27 KiB)
3

julia> @btime eval(:(  (y->w(1, y))(2)  ))
  2.135 ms (1464 allocations: 88.44 KiB)
3

Note that this is the same Fix functor defined above; it has not yet been optimized further.

If I’m testing this incorrectly, please let me know, as I have not before tried to benchmark compile times.

Edit: there’s more nuance to compile time, see below.

1 Like

Sorry, I got to this late and can’t read all 158 messages to know if this is redundant, but FWIW, I can never remember by looking at it which is the back or forward slash. And my personal opinion is that \> visually looks like it is leaning toward the first argument and /> looks like it’s leaning toward the last.

I’d use this feature 1000% either way, but it would definitely be easier for me to remember if we went for a visual instead of aural mnemonics!

3 Likes

A bit more compile-time benchmarking…

# State of the art, no piping:
julia> @btime eval(:(  map(Base.Fix2(^,2), filter(isodd, [1,2,3]))  )) 
  91.500 μs (65 allocations: 3.42 KiB)
2-element Vector{Int64}:
 1
 9

# OP proposal:
julia> @btime eval(:(  FixFirst(map, FixLast(^,2))(FixFirst(filter, isodd)([1,2,3]))  )) 
  93.600 μs (70 allocations: 3.70 KiB)
2-element Vector{Int64}:
 1
 9

# SOTA + piping
julia> @btime eval(:(  [1,2,3] |> Base.Fix1(filter, isodd) |> Base.Fix1(map, Base.Fix2(^,2))  )) 
  125.600 μs (91 allocations: 4.77 KiB)
2-element Vector{Int64}:
 1
 9

# OP proposal for partial functions, but using piping
julia> @btime eval(:(  [1,2,3] |> FixFirst(filter, isodd) |> FixFirst(map, FixLast(^,2))  )) 
  95.700 μs (70 allocations: 3.80 KiB)
2-element Vector{Int64}:
 1
 9

# New proposal for general-purpose functor w/ piping (à la #24990)
julia> @btime eval(:(  [1,2,3] |> Fix{(1,)}(filter,(isodd,)) |> Fix{(1,)}(map,(Fix{(2,)}(^,(2,)),))  )) 
  170.400 μs (146 allocations: 7.31 KiB)
2-element Vector{Int64}:
 1
 9

# piping into anonymous functions
julia> @btime eval(:(  [1,2,3] |> x->filter(isodd, x) |> x->map(Base.Fix2(^,2), x)  )) 
  3.439 ms (6780 allocations: 387.51 KiB)
2-element Vector{Int64}:
 1
 9

# one anonymous function, fed to `filter`, no pipes
julia> @btime eval(:(  map(Base.Fix2(^,2), filter(x->x%2==1, [1,2,3]))  ))
  8.452 ms (11000 allocations: 567.00 KiB)
2-element Vector{Int64}:
 1
 9

# one anonymous function, fed to `map`, no pipes
julia> @btime eval(:(  map(x->x^2, filter(isodd, [1,2,3]))  )) 
  19.296 ms (48195 allocations: 2.47 MiB)
2-element Vector{Int64}:
 1
 9

# `map` and `filter` on anonymous functions, no pipes
julia> @btime eval(:(  map(x->x^2, filter(x->x%2==1, [1,2,3]))  ))
  30.295 ms (59130 allocations: 3.02 MiB)
2-element Vector{Int64}:
 1
 9

# all anonymous functions w/ pipes
julia> @btime eval(:(  [1,2,3] |> x->filter(x->x%2==1, x) |> x->map(x->x^2, x)  ))
  29.688 ms (64845 allocations: 3.34 MiB)
2-element Vector{Int64}:
 1
 9

Look at those last four! :scream: Why’s it so slow!?

@dlakelan looks like I owe you an apology, it appears anonymous functions can take glacial amounts of time to compile sometimes (not sure when or why yet). Gives huge motivation to use partial application functors (e.g. Fix) instead where possible.

Edit: there’s more nuance to compile time, see below.

1 Like

Any time you can get the effect by syntax rearrangement it will have zero overhead at runtime so I think it’s definitely preferable

Notice that the occasions when anonymous functions were slowest to compile, were exactly the occasions when they were most likely to be used. Namely,

julia> @btime eval(:(  map(x->x^2, [1, 2, 3])  ));
  19.555 ms (48188 allocations: 2.47 MiB)

With the proposal of #24990, which I am now promoting, the expression x->x^2 would be replaced with _^2, and the compile time would be in the range of 100–200μs instead of ~20ms, e.g.

julia> @btime eval(:(  map(Fix{(2,)}(^,(2,)), [1, 2, 3])  ));
  97.900 μs (76 allocations: 3.84 KiB)

Edit: there’s more nuance to compile time, see below.

Good question.

First, some misgivings:

  • The generic misgiving: Julia already has a lot of special syntax. So accepting that we need syntax here is hard. (And yet. This issue keeps coming up again and again.)
  • I’m not entirely convinced arbitrary combinations of /> and \> are easy to read. Needs experimentation / experience I guess. My rough feeling is that … they’re not worse than do, and in fact very comparable.
  • Underscores still seem desirable to express simple functions which are passed to higher order functions, as in [(a=1,b=2), (a=3,b=4)] \> map(_.a) \> filter(_ > 1).
  • I’m not sure /> will address the problems with autocomplete in practice. In particular, because so many methods in the Julia ecosystem are completely generic (ie, take Any) so the autocomplete system inherently can’t know which ones of these are applicable. This seems more fundamental than any problem with syntax.

Some positive thoughts:

  • /> and \> seem like a neat and very visually clean way of threading the table argument through a chains of calls when doing tabular data processing. It’s strictly less visual noise than using underscore for this purpose and avoids the _ meaning two different things in expressions like [1, 2, 3, 4] |> filter(_>2, _), which is very awkward.
  • I feel it’s sufficient that these operators support only the first and last position, at least for tabular data processing purposes this seems fine.
  • Allowing /> syntax without a left hand side can naturally provide first class data pipelines independent of the data argument. It also makes this do double duty as a standalone currying syntax.
  • I see what you did with /> being “like object oriented .” and I think it’s neat for those cases where it applies. (I wonder how many of these there are, with the Julia ecosystem not focussed on the leftmost argument being special?) Syntax influences conventions about argument placement, as we see for functions designed for use with do blocks. Currently there’s little need for arguments to be in the first or last position other than taste, but the existence of /> may make the ecosystem more consistent over time.
8 Likes

Please don’t tell me this is the result: :sweat_smile:

standards

2 Likes

Because you’re also timing the lowering & compilation of new anonymous functions in every evaluation of the benchmark, which is (once again) not a representative usecase.

This is the context that motivated that testing:

I implemented this in my demo code. So you can now do:

julia> f = \> filter(isodd);

julia> f(1:10)
5-element Vector{Int64}:
 1
 3
 5
 7
 9

Or more usefully, reuse parts of pipelines, as in

julia> g = \> filter(isodd) \> map(_^2);

julia> 1:10 \> g() \> filter(_>10)
3-element Vector{Int64}:
 25
 49
 81
2 Likes

Ohh, I see now. That’s slick. :wink::thinking:

Edit:
Without that syntax, but with #24990, I might do it like this:

julia> g = map(_^2,_) ∘ filter(isodd,_)

julia> 1:10 |> g |> filter(_>10,_)
3-element Vector{Int64}:
 25
 49
 81

After I get some sleep I might be better able to think through the ramifications.

I don’t know… looking through DataFrames.jl, I see quite a few methods specialized to such types as ::AbstractDataFrame, ::DataFrameRows, ::DataFrameColumns, and the like. It seems to me that a well-designed package will have thought through the highest-level abstract type imaginable that ought to accept methods it is defining (which, frequently enough, is not Any), and then specialize the methods to that type. For big packages with complex types and many specialized methods, autocomplete can be a lifesaver.

Even such star examples of function genericism as automatic differentiation could have been just as well-served by functions specialized on type ::Number instead of ::Any.

And for those functions that really are so generic they can act on objects of type ::Any … it seems a decent bet that such a generic function would be so frequently used that the programmer would have already encountered it and committed its name to memory, and so doesn’t need assistance in recalling it (e.g. I don’t need autocomplete for map or reduce).

It’s true that we get a lot of this with a tight-binding underscore. Just unfortunate that the _ will need to appear in every function along the chain - this is a lot of visual overhead. And the fact that the _ means two separate things in the same parentheses in filter(_>10,_) - I can’t love that, though I imagine I could accept it :slight_smile:

1 Like

A package would specialize methods of existing functions to its types - yes, but:

  • Many functions have general methods applying to Any type
  • And even specialized methods typically specialize for a single argument, not all of them

Again, actual real-life examples showing that this autocompletion is potentially useful (or not, due to too much noise) would be nice!

1 Like

I continue to think \> leans to the left and most obviously fixes the leftmost argument… While /> leans right and fixes the rightmost argument. I see I wasn’t the only one

5 Likes

Yeah, @Chris_Foster’s JuliaSyntax PR is a bit different from your original proposal. So it is effectively a new proposal. But that’s ok, it’s a drop in the bucket compared to all the proposals in PR #24990. :slight_smile:

2 Likes

It appears to me that the main difference is left-associativity rather than right-associativity?

This seems ok to me, since one of the main motivations for right associativity in the original thread was to allow chains like obj1 /> func2 /> func3() without intermediate function calls; now it has to be written more like ojb1 /> func2() /> func3(), which is very much not the end of the world. Also I want to thank again @Chris_Foster for providing a nice proof of concept to play with!

By the way, someone has raised a question on the Slack which seems related here, basically asking if there is a slick way to pipe a constant argument into the back (or presumably front) of each function in a chain of calls.

3 Likes

I feel the same way. I do not like the visual noise.

My thought however, as the two approaches are redundant, both require parsing changes, and both have their own learning curves, is that it’s likely difficult to advocate for both simultaneously.

So if I have to pick one, I am compelled to choose the one that’s more generally useful and whose behavior is more obvious.

I was more referring to the fact that my OP was being kept alive at all, as I had abandoned it :sweat_smile:

The motivation for right-associativity was not this; it was for being able to fix multiple arguments. For example, arr \> filter(isodd) could also be arr /> isodd /> filter(). This would enable things like myfilt = isodd /> filter and then arr /> myfilt().

But with underscores, it would be arr |> filter(isodd, _) and myfilt=filter(isodd,_), then arr |> myfilt.

1 Like