Summary of piping/chaining proposal?

uniment · December 19, 2022, 1:30pm

Almost. My third proposal is to make this legal:

x |> {f(a, b, it, c); g(it, d; kwd=e)}

or, in order to avoid compiling an unnecessary lambda and potentially suffer performance loss from its variable capture behaviors (and to have better [tighter] operator precedence),

x.{f(a, b, it, c); g(it, d; kwd=e)}

The key thing to note is that I’m explicitly avoiding claiming the _ character for the chaining syntax, because I think it serves very well for denoting partial application (which is a distinct concept from the it keyword, and would be very useful outside the bounds of {...}).

If this proposal is accepted and PR#24990 is accepted, then what you have written will be valid.

Considering the specifics of this problem (which I laid out in my first proposal), this doesn’t seem to be the best decision-making process here. I would propose instead accepting a chaining syntax[es] on a probationary basis for some period, i.e., with no assurance that the syntax[es] will continue to be part of the language after that period, and choosing in the end whether to keep it.

I do think we want to spend some more time and experience to make sure it’s robust and stable; I just don’t think that developing consensus without substantial firsthand experience as a language feature is meaningful; the signal-to-noise ratio there would be pretty low.

I will also note that the dominant chaining packages are actually not particularly conducive to the genericism desired of a language feature anyway; for example, semantics like in Chain.jl, which automatically threads into first argument position when _ isn’t specified, is convenient for DataFrames but isn’t particularly desirable for functional styles (for example, when using currying functions like filter(f::Function) or when using transducers as part of a chain); meanwhile DataPipes.jl, which automatically threads into last argument position, isn’t helpful for functions written in an object-oriented style like those for DataFrames, nor for the curried binary operators like >(5); these behaviors therefore take away from the genericism and composability you want of a proper language feature.

Example of Chain.jl poor behavior with Transducers

julia> using Chain, Transducers

julia> @chain collect(1:100) begin
           Map(x->2x)
           Filter(>(100))
           sum
       end
ERROR: MethodError: no method matching Map(::Vector{Int64}, ::var"#3#4")

julia> @macroexpand@chain collect(1:100) begin
           Map(x->2x)
           Filter(>(100))
           sum
       end
quote
    local var"##356" = collect(1:100)
    #= REPL[65]:2 =#
    local var"##357" = Map(var"##356", (x->begin
                        #= REPL[65]:2 =#
                        2x
                    end))
    #= REPL[65]:3 =#
    local var"##358" = Filter(var"##357", (>)(100))
    #= REPL[65]:4 =#
    local var"##359" = sum(var"##358")
    var"##359"
end

Example of DataPipes.jl poor behavior with curried operator

julia> using DataPipes

julia> @p begin
           10
           >(5)
       end
┌ Warning: Pipeline step top-level function is an operator. An argument with the previous step results is still appended.
│   func = ">"
│   args =
│    1-element Vector{Int64}:
│     5
└ @ DataPipes C:\Users\unime\.julia\packages\DataPipes\z06K1\src\pipe.jl:257
false

julia> @macroexpand@p begin
           10
           >(5)
       end
┌ Warning: Pipeline step top-level function is an operator. An argument with the previous step results is still appended.
│   func = ">"
│   args =
│    1-element Vector{Int64}:
│     5
└ @ DataPipes C:\Users\unime\.julia\packages\DataPipes\z06K1\src\pipe.jl:257
quote
    #= C:\Users\unime\.julia\packages\DataPipes\z06K1\src\pipe.jl:47 =#
    #= REPL[58]:2 =#
    var"##res#345" = 10
    #= REPL[58]:3 =#
    var"##res#346" = 5 > var"##res#345"
    #= C:\Users\unime\.julia\packages\DataPipes\z06K1\src\pipe.jl:48 =#
    var"##res#346"
end

How these should work

julia> using MethodChains, Transducers

julia> MethodChains.init_repl()

julia> (10).{>(5)}
true

julia> collect(1:100).{Map(x->2x);Filter(>(100)),sum}
7550

julia> @macroexpand collect(1:100).{Map(x->2x);Filter(>(100)),sum}
:(let it = collect(1:100)
      it
      it = (Map((x->begin
                      #= REPL[158]:1 =#
                      2x
                  end)))(it)
      it = (Filter((>)(100)))(it)
      it = sum(it)
      it
  end)

So far, I haven’t found any good reasons to resonate with this sentiment. Is it merely a protest over aesthetics? Also, there are many things in Julia *much* more complex than what I have proposed, which seems inevitable considering Julia’s target audiences.

We have only three sets of bracing characters available on our keyboards: parentheses (), square brackets [], and curly braces {} (four if you count angle brackets <>, but we put those to very good uses already). Banishing one of them from a desire for Python-zen seems non-Julian.

Julia chose to use Algol-like begin...end for block expressions, which was a wonderful decision because it made it more natural to use the same style for other blocks (e.g. let, if, etc.) and freed up {} for other things.

When I consider the uses for {}, they seem to be primarily for denoting unordered lists (for which we have Set() and don’t find useful enough to justify dedicated syntax), or for set-builder notation, or for denoting switching expressions (for which we have if...elseif), or for blocks of expressions (for which we have begin...end and (...; ...)). Compared with the hypothetical uses pondered here, using it for function chains seems the most interesting and useful.

Whereas (x,y,z) is great for assembling a collection of objects and f(x,y,z) for calling a function on it, {f,g,h} is nice for assembling a collection of functions and x.{f;g;h} for passing an object through it. It’s hard to ignore the beauty in the symmetry, if even with the . dot; it’s akin to the symmetry in Julia’s decision that functions be objects and objects be functions.

@mikmoore I wasn’t smart enough to understand FixArgs.jl, so in my second proposal I developed my own partial applicator type (which is accessible here), which I showed to have favorable performance compared to Base.Fix1 and Base.Fix2.

I made mine to allow nonfixed kwargs to override fixed, and allow from-end-positional arguments to be fixed; it works like Fix{positions, num_of_args}(fun, fixedargs...; fixedkwargs...), where num_of_args dictates the number of arguments in the final call (and a -1 value indicates varargs). You can run it like this:

julia> using ChainingDemo

julia> Fix{(1, 3), 3}(f, 1, 3; a=1, c=3)
f(1, _, 3, ; a=1, c=3)

julia> @underscores f(1, _, 3; a=1, c=3)
f(1, _, 3, ; a=1, c=3)

julia> FixFirst(f, "hi!")
f(hi!, _...)

julia> FixFirst(f, "hi!") |> typeof
FixFirst{typeof(f), String, NamedTuple{(), Tuple{}}} (alias for Fix{typeof(f), (1,), -1, Tuple{String}, NamedTuple{(), Tuple{}}})

julia> Fix{(1,-3,-1), -1}(f, :start, :nextnextlast, :last)
f(start, _..., nextnextlast, _, last)

julia> @underscores f(:start, _..., :nextnextlast, _, :last)
f(start, _..., nextnextlast, _, last)

julia> @underscores filter(_%3==0, 0:10)
4-element Vector{Int64}:
 0
 3
 6
 9

julia> @underscores (xs = (x=>x^2 for x ∈ 1:4); map(_[2]/2, xs))
4-element Vector{Float64}:
 0.5
 2.0
 4.5
 8.0

Note: from-end indices are allowed only with varargs.

Also note: if you try it now, pretty-printing doesn’t work in the REPL because I made Fix subtype Function; in the REPL, objects that subtype Function have their Base.show overridden, so you need to call show manually.

I agree that a generalized Fix type would be very useful, and I think PR#24990 would be very nice syntax sugar for this. I think these would be incredibly helpful in many contexts, most notably when used with filter and map.

However, I don’t think that this combined with |> should be the preferred way to make chains, because using a partial applicator in a chain constructs a partial functor that will be used just once and discarded. This is wasteful for compile time and memory. And as I’ve also opined, |> is awkward and has the wrong precedence for most uses.

Notice that I made _ work as part of a partial application syntax like PR#24990; I do still like it and want it.

@MattEri

referenced code

MattEri:

I kinda dislike the type of code this leads (me) to, when writing actual pipelines. Let me illustrate with a simple example. (Feel free to provide corrections or better style!)
process_list = list ->
  list.{
    map(convert(Float32, {it}), it),
    filter({it > 0}, it)
  }

Your example is incorrect, and should instead be:

process_list = {
    map({convert(Float32, it)}, it)
    filter({it > 0}, it)
}

and if PR#24990 were accepted, and using the curried form of filter available in Julia 1.9 ~~(and supposing a partially-applied form of map were to become available too, e.g. map(f::Function) = FixFirst(map, f))~~, it could soon read like this:

process_list = { map(convert(Float32, _), _), filter(_>0) }

This is the same concept as my first proposal, except using claimed syntax which would require parser changes and break many things. My first proposal chose /> and \> (which are currently invalid and thus unclaimed syntax) to avoid these problems.

We cannot claim it outside of {} because it’s a valid identifier frequently used for iterators, and therefore claiming it would be a hugely breaking change. The nice thing about underscore _ is that using it as an rvalue is unclaimed syntax (yet it parses as an identifier), which makes it super interesting: it can be claimed outside special braces without causing a breaking change.

However, we must draw a distinction between the concept of a “quick lambda” and the concept of “partial application.” The debate of PR#24990 has persisted for years because of a desire to use _ to build “quick lambdas” which can do more than just partially-apply a single function. The problem with this idea is that, at the parser level, it’s generally impossible to tell where the bounds of such a “quick lambda” should be (the parser is unfortunately not a mind reader, even if Julia makes it seem so).

This makes the desire to form “quick lambdas” purely on the basis of using a special identifier untenable. That said, for most of the things that a “quick lambda” is wanted for, i.e. a single argument that passes through a couple simple functions (and never reaching a reducing function which combines it with itself in any way, nor encountering any branching logic), the partial applicator functor can work if combined with a function composition fallback as described here.

~~For example, this works with the demo code of my second proposal, mentioned above:~~

julia> using ChainingDemo

julia> @underscores @show g = √(2_+3) > 5;
g = √(Fix{(1,), 2}(*, 2) + 3) > 5 = >(_, 5) ∘ sqrt(_) ∘ +(_, 3) ∘ *(2, _)

julia> g(10)
false

julia> @underscores filter(√(2_+3) > 5, 5:15)
4-element Vector{Int64}:
 12
 13
 14
 15

However, for more general “quick lambdas” where the object takes more than one path and it interacts with itself, or if there will be branching logic, you need some unclaimed syntax to set the bounds of the expression within which the identifier has the desired meaning. Conveniently, my third proposal defines such bounds with {}, allowing you to construct arbitrary quick lambdas using it as an identifier. For example, {it+it^2} constructs a function that works like it->it+it^2.

Also notable, when you consider the use of “it” in English, it is for exactly this purpose of defining how an object should relate to itself (as well as for argument threading).

Summary

I think that in a perfect world we would have:

A Fix functor type for generalized partial application (e.g. my demo code or FixArgs.jl)
Use of _ underscores in a partial application syntax, à la PR#24990 (or my demo code)
~~A function composition fallback for partial functions when passed as arguments to other functions. As mentioned in the PR, that might be tricky; I don’t currently have a way to judge how tricky.~~
Implementation of my third proposal, to use {} for a chaining syntax. Out of my three proposals, I maintain that this one is the best.

At least in this case, I suspect the path to making the optimal decision is different from the optimal path to making a decision. Accepting these ideas into the language on a probationary basis seems appropriate (after some more vetting, certainly).

Topic		Replies	Views
Fixing the Piping/Chaining/Partial Application Issue (Rev 2) Internals & Design proposal , piping , chaining , partial-evaluation , threading	40	4080	November 26, 2022
Fixing the Piping/Chaining Issue (Rev 3) Internals & Design multithreading , syntax , piping , chaining , threading	89	7990	April 5, 2024
Fixing the Piping/Chaining Issue Internals & Design proposal , piping , chaining , partial-evaluation , threading	212	7438	January 16, 2023
Partial Application brackets without underscores Internals & Design proposal , currying , partial-evaluation	5	1648	November 21, 2022
[RFC] PipelessPipes.jl (now Chain.jl) Package Announcements	61	4727	March 25, 2021

Summary of piping/chaining proposal?

Summary

Related topics