Fixing the Piping/Chaining Issue

Notice that the occasions when anonymous functions were slowest to compile, were exactly the occasions when they were most likely to be used. Namely,

julia> @btime eval(:(  map(x->x^2, [1, 2, 3])  ));
  19.555 ms (48188 allocations: 2.47 MiB)

With the proposal of #24990, which I am now promoting, the expression x->x^2 would be replaced with _^2, and the compile time would be in the range of 100–200μs instead of ~20ms, e.g.

julia> @btime eval(:(  map(Fix{(2,)}(^,(2,)), [1, 2, 3])  ));
  97.900 μs (76 allocations: 3.84 KiB)

Edit: there’s more nuance to compile time, see below.

Good question.

First, some misgivings:

  • The generic misgiving: Julia already has a lot of special syntax. So accepting that we need syntax here is hard. (And yet. This issue keeps coming up again and again.)
  • I’m not entirely convinced arbitrary combinations of /> and \> are easy to read. Needs experimentation / experience I guess. My rough feeling is that … they’re not worse than do, and in fact very comparable.
  • Underscores still seem desirable to express simple functions which are passed to higher order functions, as in [(a=1,b=2), (a=3,b=4)] \> map(_.a) \> filter(_ > 1).
  • I’m not sure /> will address the problems with autocomplete in practice. In particular, because so many methods in the Julia ecosystem are completely generic (ie, take Any) so the autocomplete system inherently can’t know which ones of these are applicable. This seems more fundamental than any problem with syntax.

Some positive thoughts:

  • /> and \> seem like a neat and very visually clean way of threading the table argument through a chains of calls when doing tabular data processing. It’s strictly less visual noise than using underscore for this purpose and avoids the _ meaning two different things in expressions like [1, 2, 3, 4] |> filter(_>2, _), which is very awkward.
  • I feel it’s sufficient that these operators support only the first and last position, at least for tabular data processing purposes this seems fine.
  • Allowing /> syntax without a left hand side can naturally provide first class data pipelines independent of the data argument. It also makes this do double duty as a standalone currying syntax.
  • I see what you did with /> being “like object oriented .” and I think it’s neat for those cases where it applies. (I wonder how many of these there are, with the Julia ecosystem not focussed on the leftmost argument being special?) Syntax influences conventions about argument placement, as we see for functions designed for use with do blocks. Currently there’s little need for arguments to be in the first or last position other than taste, but the existence of /> may make the ecosystem more consistent over time.
8 Likes

Please don’t tell me this is the result: :sweat_smile:

standards

2 Likes

Because you’re also timing the lowering & compilation of new anonymous functions in every evaluation of the benchmark, which is (once again) not a representative usecase.

This is the context that motivated that testing:

I implemented this in my demo code. So you can now do:

julia> f = \> filter(isodd);

julia> f(1:10)
5-element Vector{Int64}:
 1
 3
 5
 7
 9

Or more usefully, reuse parts of pipelines, as in

julia> g = \> filter(isodd) \> map(_^2);

julia> 1:10 \> g() \> filter(_>10)
3-element Vector{Int64}:
 25
 49
 81
2 Likes

Ohh, I see now. That’s slick. :wink::thinking:

Edit:
Without that syntax, but with #24990, I might do it like this:

julia> g = map(_^2,_) ∘ filter(isodd,_)

julia> 1:10 |> g |> filter(_>10,_)
3-element Vector{Int64}:
 25
 49
 81

After I get some sleep I might be better able to think through the ramifications.

I don’t know… looking through DataFrames.jl, I see quite a few methods specialized to such types as ::AbstractDataFrame, ::DataFrameRows, ::DataFrameColumns, and the like. It seems to me that a well-designed package will have thought through the highest-level abstract type imaginable that ought to accept methods it is defining (which, frequently enough, is not Any), and then specialize the methods to that type. For big packages with complex types and many specialized methods, autocomplete can be a lifesaver.

Even such star examples of function genericism as automatic differentiation could have been just as well-served by functions specialized on type ::Number instead of ::Any.

And for those functions that really are so generic they can act on objects of type ::Any … it seems a decent bet that such a generic function would be so frequently used that the programmer would have already encountered it and committed its name to memory, and so doesn’t need assistance in recalling it (e.g. I don’t need autocomplete for map or reduce).

It’s true that we get a lot of this with a tight-binding underscore. Just unfortunate that the _ will need to appear in every function along the chain - this is a lot of visual overhead. And the fact that the _ means two separate things in the same parentheses in filter(_>10,_) - I can’t love that, though I imagine I could accept it :slight_smile:

1 Like

A package would specialize methods of existing functions to its types - yes, but:

  • Many functions have general methods applying to Any type
  • And even specialized methods typically specialize for a single argument, not all of them

Again, actual real-life examples showing that this autocompletion is potentially useful (or not, due to too much noise) would be nice!

1 Like

I continue to think \> leans to the left and most obviously fixes the leftmost argument… While /> leans right and fixes the rightmost argument. I see I wasn’t the only one

5 Likes

Yeah, @c42f’s JuliaSyntax PR is a bit different from your original proposal. So it is effectively a new proposal. But that’s ok, it’s a drop in the bucket compared to all the proposals in PR #24990. :slight_smile:

2 Likes

It appears to me that the main difference is left-associativity rather than right-associativity?

This seems ok to me, since one of the main motivations for right associativity in the original thread was to allow chains like obj1 /> func2 /> func3() without intermediate function calls; now it has to be written more like ojb1 /> func2() /> func3(), which is very much not the end of the world. Also I want to thank again @c42f for providing a nice proof of concept to play with!

By the way, someone has raised a question on the Slack which seems related here, basically asking if there is a slick way to pipe a constant argument into the back (or presumably front) of each function in a chain of calls.

3 Likes

I feel the same way. I do not like the visual noise.

My thought however, as the two approaches are redundant, both require parsing changes, and both have their own learning curves, is that it’s likely difficult to advocate for both simultaneously.

So if I have to pick one, I am compelled to choose the one that’s more generally useful and whose behavior is more obvious.

I was more referring to the fact that my OP was being kept alive at all, as I had abandoned it :sweat_smile:

The motivation for right-associativity was not this; it was for being able to fix multiple arguments. For example, arr \> filter(isodd) could also be arr /> isodd /> filter(). This would enable things like myfilt = isodd /> filter and then arr /> myfilt().

But with underscores, it would be arr |> filter(isodd, _) and myfilt=filter(isodd,_), then arr |> myfilt.

1 Like
I had previously shared these thoughts about more sophisticated autocomplete.

However, it looks like even those ideas don’t perfectly cover the use case I currently have in mind!

The Problem:

At the moment, I’m working with a Julia package that somebody else has developed. It’s a homebrew Julia implementation of a decent-sized API that has been professionally developed for other languages (C#, Java, Python). I had considered either using the jcall library, or writing such an API myself, until I found this package; the package has over 3,000 lines so I’d really like to leverage what has already been done.

Even though it’s homebrew, it’s surprisingly well-maintained. However, like anything homebrew, it’s poorly documented. The author trusts that you will be familiar with the existing API. Unfortunately, I am not.

That said, the author appreciates Julia’s functional style, so some things are slightly different from the official API. For example, the member methods provided by the official API’s EClient object are instead globally-accessible methods specializing on a ::Connection object. Some of these methods retain their original camelCase, while some became snake_case.

All this means that, to use it, I must refer to the official API documentation (in a different language), while acknowledging that some things will be different, and while having no method discoverability in my IDE. One must be a diehard [and irrational] Julia enthusiast not to jump ship for Python.

So would an autocomplete work with it?

Using the ideas I proposed above, mostly yes. The methods that had previously been members of EClient in the official API are now specialized to take a ::Connection object as their first argument in the Julia package, so those should be discoverable.

However, quite a few functions have been written without type specialization, presumably in part because the package author didn’t export them and can rely on module encapsulation. These will prove more difficult. And as this is not the sort of library that will have a large presence of users on GitHub, there won’t be good data to draw from for statistical inference.

Is there a way to improve that?

I think there’s a way to solve that problem too, even if imperfectly. When you have an object of type MyType which has been defined in package MyPackage, it’s fairly likely that at least some methods that have been defined in MyPackage will have been written to operate on MyType objects (even if the package author hasn’t given them type annotations).

As a result, when searching for methods, it seems a fairly sensible starting point to look first in the package where the type you are working with was defined. It could be an option in the autocomplete whether to include non-exported methods or not.

My IDE is [usually] able to bring me to the line of code where a particular function or type is defined, so this seems doable.

In short, it seems that once you tell your IDE what objects you’re working with, and you’ve communicated to it that you’re about to call a function on them, it should be able to help you find methods, searching through some combination of:

  1. Type specialization
  2. Statistical inference
  3. Package co-location / code proximity

On the surface, there is quite a bit of difference between the original proposal (OP) and the JuliaSyntax PR proposal (JS). Perhaps it works out that the overall behavior is roughly the same—I haven’t figured that out yet.

Let’s take a look at an example from JS. This expression,

x  />  f(y)  \>  g(z)

is parsed as

# S-expression pseudo-code:
(chain x (/> f y) (\> g z))

which gets lowered to

chain(x, fixbutfirst(f, y), fixbutlast(g, z))

So, in JS, /> and \> are effectively unary operators that take in a function call on the right-hand side (RHS). They do not operate on the code on their left-hand side (LHS). This is in contrast to OP, where the front/back fix operators are binary operators that operate on the LHS and the RHS.

Another difference to note is that JS has an implicit piping operation built in (piping is not currying!), which is expressed by the chain function in the parsed and lowered code. OP on the other hand, does not exactly have a piping semantic, although it kind of sneaks in by the way that function calls and front/back fix operators are parsed.

The above expression is parsed rather differently in OP. OP would produce the following after parsing:

((x  />  f)(y))  \>  g)(z) 

Let’s re-write that using S-expressions, for easier comparison with JS:

((((/> x f) y) \> g) z)

That’s quite a bit different from what JS is doing. Although we should be careful to note that /> is different in OP and JS (more on that later).

Another important difference is that when you are creating a Fix* type, OP produces nested types, whereas JS produces just one type. Consider this function:

foo(x, y, z) = x, y, z

If we want to fix the first two arguments, we would do this in OP:

:b /> :a /> foo

which returns this:

FixFirst((FixFirst(foo, :a), :b)

If we want to fix the first two arguments in JS, we would do this:

\> f(:a, :b)

which calls this:

fixbutlast(f, :a, :b)

which produces a single (non-nested) FixButLast object.

Actually, this example shows that, in JS, \> is treated basically opposite to the way it is treated in OP. In JS, (/> f x) means “fix every argument except the first”, whereas in OP, (/> x f) means “fix only the first argument, and leave all the others free”. So the JS implementation actually behaves a lot more like “front pipe” and “back pipe” than like “front fix” and “back fix”. I think it would be semantically cleaner to just have “front pipe” and “back pipe” than to have “fix every argument but the first” and “fix every argument but the last”, with an implicit chain thrown in.

3 Likes

That’s great and all, but I find it quite hard to guess the usefulness of potential autocomplete without actual examples (“for objects of this type in the last position, autocompleted functions would be: …”) or even better, a basic implementation. Something simple like propose_autocompletions(func, obj, narg)::Vector{Method} would help everyone to judge how useful autocomplete could potentially be.

Thanks for the exposition, this makes sense. and to be honest the JS version seems a little simpler to me for just about the same effect.

I just want to be clear though, am I reading it correctly that

chain(x, fixbutfirst(f, y), fixbutlast(g, z))

and

((((/> x f) y) \> g) z)

Are both going to evaluate to?

g(z, f(x,y))

That is, when evaluated “all the way” to a concrete function call, the OP and JS proposals will return the same value for any sequence of \> and /> pipes, and the main differences are in which intermediate functors get constructed?

I think that’s correct. At least I haven’t found a counter-example yet. (Aside from the different intermediate functors that you mentioned, which can of course be concretely realized if you explicitly want to save a partially applied function.)

1 Like

Okay I guess I misunderstood what you meant by “actual real-life example.”

The best examples will not be of methods or types from Base, but from packages that create complicated objects with many tightly-specialized methods, for which method discovery will be most appreciated.

I’ll use DataFrames.jl as an example. Keep in mind, this is just a simple example of how autocomplete could behave, solely by acting on types.

Example Possible Autocomplete Behavior

Here's a walked-through example of how an autocomplete *could* work with underscore syntax. (click to expand)

Let’s create an object df = DataFrame(a=1, b=2). When I type

df |> 

I should see a (very very long) list of methods appear: those which specialize on typeof(df), followed by methods which specialize on supertype(typeof(df)), followed by supertype(supertype(typeof(df))), etc., sorted by degree of specialization. The list is about two thousand entries long, something like this:

df |>
  append!(df1::DataFrame, df2::AbstractDataFrame; cols, promote)
  append!(df::DataFrame, table; cols, promote)
  copy(df::DataFrame; copycols)

  ⋮ other methods of `DataFrame`
  ⋮ (vdots here to shorten my explanation)

  Array(df::AbstractDataFrame)
  ==(df1::AbstractDataFrame, df2::AbstractDataFrame)
  (Matrix)(df::AbstractDataFrame)

  ⋮ other methods of `AbstractDataFrame`

  ArgumentError(msg)
  AssertionError(msg)
  BoundsError(a)

  ⋮ other methods of `Any`

The fact that we have underscore syntax in the language means I can call any of these methods conveniently using the pipe operator. The list was simply created by calling methodswith of the type and its supertypes, with no attention paid to argument position.

Pressing CTRL+B (or something, some hotkey combination) might change settings. For example, maybe I want to see only methods that act on abstract types, in which case pressing CTRL+B could bring up:

df |>
  Array(df::AbstractDataFrame)
  ==(df1::AbstractDataFrame, df2::AbstractDataFrame)
  (Matrix)(df::AbstractDataFrame)

  ⋮ other methods of `AbstractDataFrame`

  ArgumentError(msg)
  AssertionError(msg)
  BoundsError(a)

  ⋮ other methods of `Any`

But for now, I decide I want to see methods specialized to strictly this concrete type. So I press CTRL+B again and I see:

df |>
  append!(df1::DataFrame, df2::AbstractDataFrame; cols, promote)
  append!(df::DataFrame, table; cols, promote)
  copy(df::DataFrame; copycols)
  delete!(df::DataFrame, inds)
  deleteat!(df::DataFrame, inds::InvertedIndex)
  deleteat!(df::DataFrame, inds::AbstractVector{Bool})

  ⋮

And then I can scroll down the list to find what I’m looking for. One neuron fires in my brain and I remember that the first character is a p. So I type p and I see:

df |> p
  pop!(df::DataFrame)
  popat!(df::DataFrame, i::Integer)
  popfirst!(df::DataFrame)
  prepend!(df1::DataFrame, df2::AbstractDataFrame; cols, promote)
  prepend!(df::DataFrame, table; cols, promote)
  push!(df::DataFrame, row::Union{AbstractDict, NamedTuple}; cols, promote)
  push!(df::DataFrame, row::DataFrameRow; cols, promote)
  push!(df::DataFrame, row; promote)
  pushfirst!(df::DataFrame, row::Union{AbstractDict, NamedTuple}; cols, promote)
  pushfirst!(df::DataFrame, row::DataFrameRow; cols, promote)
  pushfirst!(df::DataFrame, row; promote)

The list is now sufficiently short that I can see the whole thing and remind myself that the function I wanted to call was pushfirst!, and I type u. Now I see:

df |> pu
  push!(df::DataFrame, row::Union{AbstractDict, NamedTuple}; cols, promote)
  push!(df::DataFrame, row::DataFrameRow; cols, promote)
  push!(df::DataFrame, row; promote)
  pushfirst!(df::DataFrame, row::Union{AbstractDict, NamedTuple}; cols, promote)
  pushfirst!(df::DataFrame, row::DataFrameRow; cols, promote)
  pushfirst!(df::DataFrame, row; promote)

I hit <tab> and it autocompletes to push:

df |> push
  push!(df::DataFrame, row::Union{AbstractDict, NamedTuple}; cols, promote)
  push!(df::DataFrame, row::DataFrameRow; cols, promote)
  push!(df::DataFrame, row; promote)
  pushfirst!(df::DataFrame, row::Union{AbstractDict, NamedTuple}; cols, promote)
  pushfirst!(df::DataFrame, row::DataFrameRow; cols, promote)
  pushfirst!(df::DataFrame, row; promote)

Now I type f:

df |> pushf
  pushfirst!(df::DataFrame, row::Union{AbstractDict, NamedTuple}; cols, promote)
  pushfirst!(df::DataFrame, row::DataFrameRow; cols, promote)
  pushfirst!(df::DataFrame, row; promote)

I press <tab> again and the name fully fills out, including the unfixed argument and placing the cursor after the comma. Now I see:

df |> pushfirst!(_, )
  pushfirst!(df::DataFrame, row::Union{AbstractDict, NamedTuple}; cols, promote)
  pushfirst!(df::DataFrame, row::DataFrameRow; cols, promote)
  pushfirst!(df::DataFrame, row; promote)

and the list all but disappears as I fill out the rest of the arguments.

df |> pushfirst!(_, [1, 2])
  pushfirst!(df::DataFrame, row; promote)

The autocomplete has assisted me in finding the method I was looking for, enabling me to search for methods which specialize on its concrete type.

Where m is one of the methods returned by calling methodswith, this simple example uses only m.sig and doesn’t sort by argument position at all. However, it could be imagined to do so.

In addition, it could be imagined to use m.module to sort methods by what module defined them, showing first the methods defined in the same module as this object (using parentmodule(MyType)); m.file to find only the methods which were defined in the same file as this object; or any of the other properties of a Method to return better search results. (and the autocomplete could have a suite of hotkeys, or a settings panel, or some settings popup dialog, to determine how it searches.) It could also use statistical inference based on function call data from GitHub, or even your personal use data, to return better search results.

I hope I don’t have to write my own autocomplete, and I’m totally incompetent, but if I’m pushed hard enough…

1 Like

Thanks for the detailed comparison!

I’ll admit that I looked at the OP and thought “oh more chaining/anon function stuff” and jumped to conclusions, apparently without reading the OP properly :sweat_smile:

So the significant underlying differences here weren’t exactly intentional! But might be an interesting alternative take.

2 Likes