Summary of piping/chaining proposal?

There’s been a lot of posts about piping/chaining syntax, including:

The syntax is looking quite odd to my eyes and I’m really not following the thinking, but I’m wondering if someone—probably @uniment—could give as concise a summary of the current proposal as possible, especially separating the core idea from additional elaborations. The original idea of introducing \> and /> seemed pretty reasonable to me and now it’s veered into something I don’t get at all with curly braces and dots.

22 Likes

There have been a ton of ideas thrown around across those threads.

I believe the \> and /> thread (heh) of thought more or less culminated in the experimental JuliaSyntax PR here. It became very difficult to treat these two as generic partial application operators, but it seems they function pretty nicely as front-pipes and back-pipes, especially when they are allowed to be ‘headless’ for currying in simple situations like filter(/> foo, arr)

The other line of proposals are (in my own words, possibly @uniment disagrees) are essentially exploring syntax solely for composition, and and in a way hopefully compatible with whatever #24990 figures out, if ever, for partial application.

That is {f, g, h} is more or less synonymous with h ∘ g ∘ f, but there is some ability to have intermediate let blocks or pass into a specific argument position via the keyword it or an _. The dot syntax to call the chain I think was chosen largely because it was available and matches OOP languages, but I kind of prefer still using the pipes x |> {f, g, h}

Along those lines, I think a possibly more Julian / less controversial syntax would be simply to take the best parts of Chain.jl and DataPipes.jl into a built-in block type

chain
   ...
end

To be more or less synonymous with

x -> @chain x begin
    identity
    ...
end

And this is not incompatible with the aforementioned front/back pipes, nor with (most) of the proposed partial application solutions

3 Likes

It’s been quite a journey (for me, anyway)! I’ll summarize from my perspective, as the instigator and closest observer.

My intent from the start has been to find a chaining syntax which would be worthy of adoption into Julia, in large part to get better method discovery and autocomplete, but also because sometimes it’s simply more natural to express things this way (e.g., “the baby’s length” instead of “the length of the baby”). To me, the |> pipe operator fails at this primarily for four reasons: 1.) inability to specify more than one argument, 2.) low operator precedence, forcing the chain to be inconvenient as anything but a final operation, 3.) requirement to construct lambdas, hurting compile time, and 4.) terrible to type. What I have arrived at through this Odyssey is likely one of the most general chaining syntaxes in human history :sweat_smile:.

History: how the proposal evolved into its current form

First proposal:

I was hoping to kill two birds with one stone: to use partial application for chaining. (Also, I thought this would be easy :sweat_smile:)

I was a proponent of /> and \> (as syntax sugar for construction of FixFirst and FixLast partial applicator types), but that was until @CameronBieganek helped me realize that it didn’t quite work—not for partial application in the way I had imagined anyway. So, after learning more about PR#24990, I jumped ship for it as a more general partial application syntax, to the point of creating a generalized Fix partial applicator type for it (and doing benchmarks that showed favorable performance in comparison to Base.Fix1 and Base.Fix2). Sure, you’d live with some extra underscores, but the generality and transparency make up for it imo (and autocomplete would eventually make it a non-issue).

@Chris_Foster offered a JuliaSyntax demo showing how /> and \> could operate as partial applicators in a mirror form to how I had imagined (namely, to fix all-but-one argument), but by this point I had fallen out of love with them; I wanted chaining syntax which would work well with PR#24990 due to its greater generality. (Use of PR#24990 for chaining is essentially fixing all-but-one too, but without the constraint to first- or last- argument.)

Second proposal:

I pondered the issue, trying to understand what it was that people liked so much about Chain.jl, and I realized that its meaning for underscores, to be the result of the previous operation, is the exact definition of the English pronoun “it.” People love the concept of “it” because it allows us to do little tweaks here and there, allowing us to compose tasks which weren’t built to be composed. So I asked myself: Can I think of an unclaimed syntax which could work with PR#24990, and incorporate this meaning of “it” for more generalized function composition (the way our natural language affords us)?

So in the second proposal, I introduced the local keyword (unsurprisingly) it. I didn’t want its name to clash with _ underscore partial application, because they’re meaningfully different. But I really liked the extra flexiblity it provided, which is exactly what people like so much about Chain.jl (and which is, in my estimation, what made #24990 so difficult to push through).

For occasions where you simply wanted to call a function, you’d type its name—and possibly use underscores for partial application as PR#24990 proposes—and for those other odd cases where you wanted a bit more, you’d say it. So I chose an unclaimed syntax --() and bounding parentheses in which it would be defined. For example: x--(f, it+it^2, g(_, 2, 3)) would mean let it=x; it=f(it); it=it+it^2; it=g(it, 2, 3) end. For greater generality, I figured you might want to declare functions this way too, so I proposed a “headless” --(f, g, h) to mean it->(it=f(it); it=g(it); it=h(it)).

(Note: the direct substitution of g(_, 2, 3) into g(it, 2, 3), instead of g(_, 2, 3)(it), arose from @dlakelan’s continued prodding, which made me realize that partial application carried performance drawbacks, namely compilation time; it’d be preferable to do the substitution in-place if you know you’re simply going to consume the partial applicator anyway.)

Third proposal:

Some chatting with @christophE made me realize that not only is {} unclaimed syntax, but x.{} is unclaimed too. This made the hamster wheel in my head go crazy, because this a) requires no parser changes, so can be implemented today, and b) has exactly the operator precedence I want. So instead of x--(f,g,h) as in the second proposal, you’d type x.{f,g,h}, and instead of “headless” --(f,g,h), you’d write {f,g,h}. It’s a drop-in replacement for the second proposal.

But there’s a twist: {} is very powerful syntax; because it parses like [], you can construct 2-dimensional sets of expressions. I didn’t want to let such powerful syntax go to waste, so I asked the question: Can I meaningfully extend the concept of chaining to two dimensions? What would such a thing look like? Is it useful?

So in the third proposal, I dropped the discussion of partial application (to simplify the discussion), and I introduced some semantics for how expressions could spread across two dimensions. I also showed how you could implement a fast Fourier transform using these semantics.

And that brings us to today. Whew, that was actually kind of a lot :sweat_smile:

In short, the easiest way to imagine this proposal is taking the features of Chain.jl that people like, excluding parts that hurt its generality, including new things that extend its generality, and packaging it in a concise unclaimed syntax.

Core Behaviors:

Each expression is assumed to be either a function to be called, or an expression of it. (This is the same as Chain.jl, except using it instead of _.)

  1. x.{f; g} becomes a statement let it=x; it=f(it); it=g(it); it end. Notice the absence of a lambda, so there’s no compile-time penalty for using it.
  2. {f; g} becomes a function like it->begin it=f(it); it=g(it); it end.
  3. x.{f(it, y, z)} is let it=x; it=f(it, y, z); it end.
  4. {it+it^2} is a function like it->begin it=it+it^2; it end.

Notable decision points:

  1. I use it the same way that Chain.jl uses _, to mean the result of the previous expression. This is because I don’t want to claim _, so that it can remain free for use in partial application as PR#24990 proposes, and because the singular non-gendered object pronoun “it” carries the same exact meaning we’re after here.
  2. Simple chains, e.g. x.{first}.a to mean first(x).a, are possible because of high . operator precedence. I contend that this is an unalloyed good.
  3. Unlike Chain.jl which defaults to threading it into first argument position when it sees a function call, or DataPipes.jl which defaults to threading into last, I make no such assumption (this simplifies behavior to improve generality). Autocomplete will make this a non-issue anyway.
  4. Curly braces delimit the bounds of the chaining behavior. This enables single-argument “quick lambdas.”

Simple Extended Behaviors:

  1. Expressions are assumed either to be expressions of it, or to evaluate into functions to call on it. In cases where that’s obviously not true (e.g., :tuple or :generator expressions), no attempt is made to call them; they are simply assigned to it as-is.
  2. If there’s an assignment, then it is not assigned; this allows local variables to be declared. For example, x.{len = length(it); sum(it)/len} takes the mean of x by becoming let it=x; local len = length(it); it=sum(it)/len; it end.
  3. f(arg) do {g; h} end is an experimental alternate syntax for f({g; h}, arg) (I would prefer f(arg) do {g; h} but the parser doesn’t allow that.).
  4. recurse is an experimental locally-defined keyword which I haven’t talked about. Inside callable chains, e.g. {it ≤ 1 ? it : recurse(it-1)+recurse(it-2)}, loop is the function’s self-reference for recursion. This allows performant recursive chains (i.e., their self-reference is not boxed) to be assigned to non-const identifiers.

Advanced Extended Behaviors (Multi-Chains):

  1. For parallel chains of execution, Multichains are implemented. Multichains can be used to specify parallel execution threads/distributed processes, or for graphically arranging algorithms (e.g. my toy FFT demo).
  2. A value can be distributed across new chains by splatting .... If new chains start without any previous splat, then the right-most value is copied.
  3. To collect the values of the parallel chains, use a local keyword them: this will collect the parallel chains’ it values into a single tuple. Otherwise, when the number of columns reduces, any uncollected values will be dropped.

All keywords defined within the context of {} are it, them, and loop.

Most of the present debate seems to be either a) saber rattling that we should infact claim _ as Chain.jl does (and murder PR#24990), b) that the multi-chain behavior is too general and confusing, c) that curly braces are somehow not Julian, or d) that achieving the consensus to obtain a chaining syntax is a fool’s errand. I can definitely get onboard with a more verbose syntax for {} when multiline block expressions are to be made, but to me it seems silly to rally around banishing such a powerful brace syntax. And I’ve never had the wisdom to avoid a good fool’s errand :laughing:

As for murdering PR#24990… if the crowd chants loudly enough, then maybe the right move is to wash my hands like Pontius Pilate and order the execution. I’d like to believe not, but I am only one.

7 Likes

There are Base.Fix1 and Base.Fix2. However when piping functions involved usually have more than 2 parameters. IMHO, the proposed curly braces syntax {f, g} allow people to legally write

x |> {f(a, b, _,c), g(_, d, kwd=e)},

and don’t bother to define structure Fix3 such that f(x,y,_) :: Fix3{typeof(f)} in advance. Also, this is stay only on surface syntax.

There are not unique way to parse it though. One is

Another might be just

(call g (parameters (kw kwd e)) (call f a b x c) d)

I suggest we should keep using independent packages with macros to
implement candidate chaining/piping/… syntax options until a community
consensus on the best approach is reached.

I particularly dislike the use of “{” or “}” and the introduction of complex
syntax into julia and locking in these extra bracketing characters into
the language proper.

I have concerns that adding neat and cool language syntax that
creates dense, possibly difficult to follow code sections could
make the Julia language less successful or usable in the end.

There have been many interesting ideas presented and discussed.
Lets try them all out against eachother for a year to get things
stable and robust and then revisit what makes the most sense
for Julia going forwards.

1 Like

These discussion have suggested that there might be a place in the language for a generic Base.FixAt{POSITIONS}(fun,values) function to generalize Fix1 and Fix2. For example, Base.Fix{(1,4)}(+,(11,14))(12,13,args...) == +(11,12,13,14,args...). This would at least make it slightly more ergonomic to chain via the existing |> in some situations.

Such a function should probably also be equipped to accept keyword arguments (fixed or nonfixed, probably with nonfixed overriding fixed), although there could be some debate on syntax there. It might also be useful to allow from-end-positional arguments to be fixed, but I think those would not be widely used and could be messy to work out if combined with front-indexed fixes (as collisions would be possible – although those could simply result in errors).

I kinda dislike the type of code this leads (me) to, when writing actual pipelines. Let me illustrate with a simple example. (Feel free to provide corrections or better style!)

process_list = list ->
  list.{
    map(convert(Float32, {it}), it),
    filter({it > 0}, it)
  }

(I’m assuming that each gullwing nesting leads to a new unique it. Correct?)

A counter-suggestion (other symbols might be preferrable):

  1. Front/back passing (let binding shorthand):
    a _> f(b,c)
    c >_ f(a,b)
    f(a,b,c)
  2. Shorthand lambdas:
    it > 0x -> (x > 0)

Same example: (>_ and _> can be read as “goes into” or “smart pass” or similar.)

process_list = it >_
  map(convert(Float32, it)) >_
  filter(it > 0)

Here’s an important part: Both versions have an error (same one). Did you already notice it? If not, can you find it? Can you fix it? Solutions below. Try to solve it first, in your preferred version! :wink: (Assume that all definitions work as expected; this is a usage error.)











# 'it' is shorthand for x -> (x …) in the gull-less suggestion, so
convert(Float32, it) == convert(Float32, x -> (x)) ≠ x->convert(Float32, x)

# In short, a naked 'it' inside a function does not change the signature
# in the gull-less suggestion.

# Fixed versions:
process_list = it >_
  map(it >_ convert(Float32)) >_
  filter(it > 0)

process_list = list ->
  list.{
    map({convert(Float32, it)}, it),
    filter({it > 0}, it)
  }

Again, let me know if I’ve misunderstood the gullwings.

Almost. My third proposal is to make this legal:

x |> {f(a, b, it, c); g(it, d; kwd=e)}

or, in order to avoid compiling an unnecessary lambda and potentially suffer performance loss from its variable capture behaviors (and to have better [tighter] operator precedence),

x.{f(a, b, it, c); g(it, d; kwd=e)}

The key thing to note is that I’m explicitly avoiding claiming the _ character for the chaining syntax, because I think it serves very well for denoting partial application (which is a distinct concept from the it keyword, and would be very useful outside the bounds of {...}).

If this proposal is accepted and PR#24990 is accepted, then what you have written will be valid.



Considering the specifics of this problem (which I laid out in my first proposal), this doesn’t seem to be the best decision-making process here. I would propose instead accepting a chaining syntax[es] on a probationary basis for some period, i.e., with no assurance that the syntax[es] will continue to be part of the language after that period, and choosing in the end whether to keep it.

I do think we want to spend some more time and experience to make sure it’s robust and stable; I just don’t think that developing consensus without substantial firsthand experience as a language feature is meaningful; the signal-to-noise ratio there would be pretty low.

I will also note that the dominant chaining packages are actually not particularly conducive to the genericism desired of a language feature anyway; for example, semantics like in Chain.jl, which automatically threads into first argument position when _ isn’t specified, is convenient for DataFrames but isn’t particularly desirable for functional styles (for example, when using currying functions like filter(f::Function) or when using transducers as part of a chain); meanwhile DataPipes.jl, which automatically threads into last argument position, isn’t helpful for functions written in an object-oriented style like those for DataFrames, nor for the curried binary operators like >(5); these behaviors therefore take away from the genericism and composability you want of a proper language feature.

Example of Chain.jl poor behavior with Transducers
julia> using Chain, Transducers

julia> @chain collect(1:100) begin
           Map(x->2x)
           Filter(>(100))
           sum
       end
ERROR: MethodError: no method matching Map(::Vector{Int64}, ::var"#3#4")

julia> @macroexpand@chain collect(1:100) begin
           Map(x->2x)
           Filter(>(100))
           sum
       end
quote
    local var"##356" = collect(1:100)
    #= REPL[65]:2 =#
    local var"##357" = Map(var"##356", (x->begin
                        #= REPL[65]:2 =#
                        2x
                    end))
    #= REPL[65]:3 =#
    local var"##358" = Filter(var"##357", (>)(100))
    #= REPL[65]:4 =#
    local var"##359" = sum(var"##358")
    var"##359"
end
Example of DataPipes.jl poor behavior with curried operator
julia> using DataPipes

julia> @p begin
           10
           >(5)
       end
┌ Warning: Pipeline step top-level function is an operator. An argument with the previous step results is still appended.
│   func = ">"
│   args =
│    1-element Vector{Int64}:
│     5
└ @ DataPipes C:\Users\unime\.julia\packages\DataPipes\z06K1\src\pipe.jl:257
false

julia> @macroexpand@p begin
           10
           >(5)
       end
┌ Warning: Pipeline step top-level function is an operator. An argument with the previous step results is still appended.
│   func = ">"
│   args =
│    1-element Vector{Int64}:
│     5
└ @ DataPipes C:\Users\unime\.julia\packages\DataPipes\z06K1\src\pipe.jl:257
quote
    #= C:\Users\unime\.julia\packages\DataPipes\z06K1\src\pipe.jl:47 =#
    #= REPL[58]:2 =#
    var"##res#345" = 10
    #= REPL[58]:3 =#
    var"##res#346" = 5 > var"##res#345"
    #= C:\Users\unime\.julia\packages\DataPipes\z06K1\src\pipe.jl:48 =#
    var"##res#346"
end
How these should work
julia> using MethodChains, Transducers

julia> MethodChains.init_repl()

julia> (10).{>(5)}
true

julia> collect(1:100).{Map(x->2x);Filter(>(100)),sum}
7550

julia> @macroexpand collect(1:100).{Map(x->2x);Filter(>(100)),sum}
:(let it = collect(1:100)
      it
      it = (Map((x->begin
                      #= REPL[158]:1 =#
                      2x
                  end)))(it)
      it = (Filter((>)(100)))(it)
      it = sum(it)
      it
  end)

So far, I haven’t found any good reasons to resonate with this sentiment. Is it merely a protest over aesthetics? Also, there are many things in Julia *much* more complex than what I have proposed, which seems inevitable considering Julia’s target audiences.

We have only three sets of bracing characters available on our keyboards: parentheses (), square brackets [], and curly braces {} (four if you count angle brackets <>, but we put those to very good uses already). Banishing one of them from a desire for Python-zen seems non-Julian.

Julia chose to use Algol-like begin...end for block expressions, which was a wonderful decision because it made it more natural to use the same style for other blocks (e.g. let, if, etc.) and freed up {} for other things.

When I consider the uses for {}, they seem to be primarily for denoting unordered lists (for which we have Set() and don’t find useful enough to justify dedicated syntax), or for set-builder notation, or for denoting switching expressions (for which we have if...elseif), or for blocks of expressions (for which we have begin...end and (...; ...)). Compared with the hypothetical uses pondered here, using it for function chains seems the most interesting and useful.

Whereas (x,y,z) is great for assembling a collection of objects and f(x,y,z) for calling a function on it, {f,g,h} is nice for assembling a collection of functions and x.{f;g;h} for passing an object through it. It’s hard to ignore the beauty in the symmetry, if even with the . dot; it’s akin to the symmetry in Julia’s decision that functions be objects and objects be functions.



@mikmoore I wasn’t smart enough to understand FixArgs.jl, so in my second proposal I developed my own partial applicator type (which is accessible here), which I showed to have favorable performance compared to Base.Fix1 and Base.Fix2.

I made mine to allow nonfixed kwargs to override fixed, and allow from-end-positional arguments to be fixed; it works like Fix{positions, num_of_args}(fun, fixedargs...; fixedkwargs...), where num_of_args dictates the number of arguments in the final call (and a -1 value indicates varargs). You can run it like this:

julia> using ChainingDemo

julia> Fix{(1, 3), 3}(f, 1, 3; a=1, c=3)
f(1, _, 3, ; a=1, c=3)

julia> @underscores f(1, _, 3; a=1, c=3)
f(1, _, 3, ; a=1, c=3)

julia> FixFirst(f, "hi!")
f(hi!, _...)

julia> FixFirst(f, "hi!") |> typeof
FixFirst{typeof(f), String, NamedTuple{(), Tuple{}}} (alias for Fix{typeof(f), (1,), -1, Tuple{String}, NamedTuple{(), Tuple{}}})

julia> Fix{(1,-3,-1), -1}(f, :start, :nextnextlast, :last)
f(start, _..., nextnextlast, _, last)

julia> @underscores f(:start, _..., :nextnextlast, _, :last)
f(start, _..., nextnextlast, _, last)

julia> @underscores filter(_%3==0, 0:10)
4-element Vector{Int64}:
 0
 3
 6
 9

julia> @underscores (xs = (x=>x^2 for x ∈ 1:4); map(_[2]/2, xs))
4-element Vector{Float64}:
 0.5
 2.0
 4.5
 8.0

Note: from-end indices are allowed only with varargs.

Also note: if you try it now, pretty-printing doesn’t work in the REPL because I made Fix subtype Function; in the REPL, objects that subtype Function have their Base.show overridden, so you need to call show manually.

I agree that a generalized Fix type would be very useful, and I think PR#24990 would be very nice syntax sugar for this. I think these would be incredibly helpful in many contexts, most notably when used with filter and map.

However, I don’t think that this combined with |> should be the preferred way to make chains, because using a partial applicator in a chain constructs a partial functor that will be used just once and discarded. This is wasteful for compile time and memory. And as I’ve also opined, |> is awkward and has the wrong precedence for most uses.

Notice that I made _ work as part of a partial application syntax like PR#24990; I do still like it and want it.



@MattEri

referenced code

Your example is incorrect, and should instead be:

process_list = {
    map({convert(Float32, it)}, it)
    filter({it > 0}, it)
}

and if PR#24990 were accepted, and using the curried form of filter available in Julia 1.9 (and supposing a partially-applied form of map were to become available too, e.g. map(f::Function) = FixFirst(map, f)), it could soon read like this:

process_list = { map(convert(Float32, _), _), filter(_>0) }

This is the same concept as my first proposal, except using claimed syntax which would require parser changes and break many things. My first proposal chose /> and \> (which are currently invalid and thus unclaimed syntax) to avoid these problems.

We cannot claim it outside of {} because it’s a valid identifier frequently used for iterators, and therefore claiming it would be a hugely breaking change. The nice thing about underscore _ is that using it as an rvalue is unclaimed syntax (yet it parses as an identifier), which makes it super interesting: it can be claimed outside special braces without causing a breaking change.

However, we must draw a distinction between the concept of a “quick lambda” and the concept of “partial application.” The debate of PR#24990 has persisted for years because of a desire to use _ to build “quick lambdas” which can do more than just partially-apply a single function. The problem with this idea is that, at the parser level, it’s generally impossible to tell where the bounds of such a “quick lambda” should be (the parser is unfortunately not a mind reader, even if Julia makes it seem so).

This makes the desire to form “quick lambdas” purely on the basis of using a special identifier untenable. That said, for most of the things that a “quick lambda” is wanted for, i.e. a single argument that passes through a couple simple functions (and never reaching a reducing function which combines it with itself in any way, nor encountering any branching logic), the partial applicator functor can work if combined with a function composition fallback as described here.

For example, this works with the demo code of my second proposal, mentioned above:

julia> using ChainingDemo

julia> @underscores @show g = √(2_+3) > 5;
g = √(Fix{(1,), 2}(*, 2) + 3) > 5 = >(_, 5) ∘ sqrt(_) ∘ +(_, 3) ∘ *(2, _)

julia> g(10)
false

julia> @underscores filter(√(2_+3) > 5, 5:15)
4-element Vector{Int64}:
 12
 13
 14
 15

However, for more general “quick lambdas” where the object takes more than one path and it interacts with itself, or if there will be branching logic, you need some unclaimed syntax to set the bounds of the expression within which the identifier has the desired meaning. Conveniently, my third proposal defines such bounds with {}, allowing you to construct arbitrary quick lambdas using it as an identifier. For example, {it+it^2} constructs a function that works like it->it+it^2.

Also notable, when you consider the use of “it” in English, it is for exactly this purpose of defining how an object should relate to itself (as well as for argument threading).



Summary

I think that in a perfect world we would have:

  1. A Fix functor type for generalized partial application (e.g. my demo code or FixArgs.jl)
  2. Use of _ underscores in a partial application syntax, à la PR#24990 (or my demo code)
  3. A function composition fallback for partial functions when passed as arguments to other functions. As mentioned in the PR, that might be tricky; I don’t currently have a way to judge how tricky.
  4. Implementation of my third proposal, to use {} for a chaining syntax. Out of my three proposals, I maintain that this one is the best.

At least in this case, I suspect the path to making the optimal decision is different from the optimal path to making a decision. Accepting these ideas into the language on a probationary basis seems appropriate (after some more vetting, certainly).

2 Likes