Summary of piping/chaining proposal?

It’s been quite a journey (for me, anyway)! I’ll summarize from my perspective, as the instigator and closest observer.

My intent from the start has been to find a chaining syntax which would be worthy of adoption into Julia, in large part to get better method discovery and autocomplete, but also because sometimes it’s simply more natural to express things this way (e.g., “the baby’s length” instead of “the length of the baby”). To me, the |> pipe operator fails at this primarily for four reasons: 1.) inability to specify more than one argument, 2.) low operator precedence, forcing the chain to be inconvenient as anything but a final operation, 3.) requirement to construct lambdas, hurting compile time, and 4.) terrible to type. What I have arrived at through this Odyssey is likely one of the most general chaining syntaxes in human history :sweat_smile:.

History: how the proposal evolved into its current form

First proposal:

I was hoping to kill two birds with one stone: to use partial application for chaining. (Also, I thought this would be easy :sweat_smile:)

I was a proponent of /> and \> (as syntax sugar for construction of FixFirst and FixLast partial applicator types), but that was until @CameronBieganek helped me realize that it didn’t quite work—not for partial application in the way I had imagined anyway. So, after learning more about PR#24990, I jumped ship for it as a more general partial application syntax, to the point of creating a generalized Fix partial applicator type for it (and doing benchmarks that showed favorable performance in comparison to Base.Fix1 and Base.Fix2). Sure, you’d live with some extra underscores, but the generality and transparency make up for it imo (and autocomplete would eventually make it a non-issue).

@c42f offered a JuliaSyntax demo showing how /> and \> could operate as partial applicators in a mirror form to how I had imagined (namely, to fix all-but-one argument), but by this point I had fallen out of love with them; I wanted chaining syntax which would work well with PR#24990 due to its greater generality. (Use of PR#24990 for chaining is essentially fixing all-but-one too, but without the constraint to first- or last- argument.)

Second proposal:

I pondered the issue, trying to understand what it was that people liked so much about Chain.jl, and I realized that its meaning for underscores, to be the result of the previous operation, is the exact definition of the English pronoun “it.” People love the concept of “it” because it allows us to do little tweaks here and there, allowing us to compose tasks which weren’t built to be composed. So I asked myself: Can I think of an unclaimed syntax which could work with PR#24990, and incorporate this meaning of “it” for more generalized function composition (the way our natural language affords us)?

So in the second proposal, I introduced the local keyword (unsurprisingly) it. I didn’t want its name to clash with _ underscore partial application, because they’re meaningfully different. But I really liked the extra flexiblity it provided, which is exactly what people like so much about Chain.jl (and which is, in my estimation, what made #24990 so difficult to push through).

For occasions where you simply wanted to call a function, you’d type its name—and possibly use underscores for partial application as PR#24990 proposes—and for those other odd cases where you wanted a bit more, you’d say it. So I chose an unclaimed syntax --() and bounding parentheses in which it would be defined. For example: x--(f, it+it^2, g(_, 2, 3)) would mean let it=x; it=f(it); it=it+it^2; it=g(it, 2, 3) end. For greater generality, I figured you might want to declare functions this way too, so I proposed a “headless” --(f, g, h) to mean it->(it=f(it); it=g(it); it=h(it)).

(Note: the direct substitution of g(_, 2, 3) into g(it, 2, 3), instead of g(_, 2, 3)(it), arose from @dlakelan’s continued prodding, which made me realize that partial application carried performance drawbacks, namely compilation time; it’d be preferable to do the substitution in-place if you know you’re simply going to consume the partial applicator anyway.)

Third proposal:

Some chatting with @christophE made me realize that not only is {} unclaimed syntax, but x.{} is unclaimed too. This made the hamster wheel in my head go crazy, because this a) requires no parser changes, so can be implemented today, and b) has exactly the operator precedence I want. So instead of x--(f,g,h) as in the second proposal, you’d type x.{f,g,h}, and instead of “headless” --(f,g,h), you’d write {f,g,h}. It’s a drop-in replacement for the second proposal.

But there’s a twist: {} is very powerful syntax; because it parses like [], you can construct 2-dimensional sets of expressions. I didn’t want to let such powerful syntax go to waste, so I asked the question: Can I meaningfully extend the concept of chaining to two dimensions? What would such a thing look like? Is it useful?

So in the third proposal, I dropped the discussion of partial application (to simplify the discussion), and I introduced some semantics for how expressions could spread across two dimensions. I also showed how you could implement a fast Fourier transform using these semantics.

And that brings us to today. Whew, that was actually kind of a lot :sweat_smile:

In short, the easiest way to imagine this proposal is taking the features of Chain.jl that people like, excluding parts that hurt its generality, including new things that extend its generality, and packaging it in a concise unclaimed syntax.

Core Behaviors:

Each expression is assumed to be either a function to be called, or an expression of it. (This is the same as Chain.jl, except using it instead of _.)

  1. x.{f; g} becomes a statement let it=x; it=f(it); it=g(it); it end. Notice the absence of a lambda, so there’s no compile-time penalty for using it.
  2. {f; g} becomes a function like it->begin it=f(it); it=g(it); it end.
  3. x.{f(it, y, z)} is let it=x; it=f(it, y, z); it end.
  4. {it+it^2} is a function like it->begin it=it+it^2; it end.

Notable decision points:

  1. I use it the same way that Chain.jl uses _, to mean the result of the previous expression. This is because I don’t want to claim _, so that it can remain free for use in partial application as PR#24990 proposes, and because the singular non-gendered object pronoun “it” carries the same exact meaning we’re after here.
  2. Simple chains, e.g. x.{first}.a to mean first(x).a, are possible because of high . operator precedence. I contend that this is an unalloyed good.
  3. Unlike Chain.jl which defaults to threading it into first argument position when it sees a function call, or DataPipes.jl which defaults to threading into last, I make no such assumption (this simplifies behavior to improve generality). Autocomplete will make this a non-issue anyway.
  4. Curly braces delimit the bounds of the chaining behavior. This enables single-argument “quick lambdas.”

Simple Extended Behaviors:

  1. Expressions are assumed either to be expressions of it, or to evaluate into functions to call on it. In cases where that’s obviously not true (e.g., :tuple or :generator expressions), no attempt is made to call them; they are simply assigned to it as-is.
  2. If there’s an assignment, then it is not assigned; this allows local variables to be declared. For example, x.{len = length(it); sum(it)/len} takes the mean of x by becoming let it=x; local len = length(it); it=sum(it)/len; it end.
  3. f(arg) do {g; h} end is an experimental alternate syntax for f({g; h}, arg) (I would prefer f(arg) do {g; h} but the parser doesn’t allow that.).
  4. recurse is an experimental locally-defined keyword which I haven’t talked about. Inside callable chains, e.g. {it ≤ 1 ? it : recurse(it-1)+recurse(it-2)}, loop is the function’s self-reference for recursion. This allows performant recursive chains (i.e., their self-reference is not boxed) to be assigned to non-const identifiers.

Advanced Extended Behaviors (Multi-Chains):

  1. For parallel chains of execution, Multichains are implemented. Multichains can be used to specify parallel execution threads/distributed processes, or for graphically arranging algorithms (e.g. my toy FFT demo).
  2. A value can be distributed across new chains by splatting .... If new chains start without any previous splat, then the right-most value is copied.
  3. To collect the values of the parallel chains, use a local keyword them: this will collect the parallel chains’ it values into a single tuple. Otherwise, when the number of columns reduces, any uncollected values will be dropped.

All keywords defined within the context of {} are it, them, and loop.

Most of the present debate seems to be either a) saber rattling that we should infact claim _ as Chain.jl does (and murder PR#24990), b) that the multi-chain behavior is too general and confusing, c) that curly braces are somehow not Julian, or d) that achieving the consensus to obtain a chaining syntax is a fool’s errand. I can definitely get onboard with a more verbose syntax for {} when multiline block expressions are to be made, but to me it seems silly to rally around banishing such a powerful brace syntax. And I’ve never had the wisdom to avoid a good fool’s errand :laughing:

As for murdering PR#24990… if the crowd chants loudly enough, then maybe the right move is to wash my hands like Pontius Pilate and order the execution. I’d like to believe not, but I am only one.

9 Likes