Fixing the Piping/Chaining/Partial Application Issue (Rev 2)

bertschi · November 19, 2022, 2:27am

"uniment:

For multi-line chains, which will be fairly sparse in a given codebase, using base Julia for a call chain is ugly but not too bad. But for short, inline chains, the extra characters add enough effort and visual noise as to be prohibitive. For example:
my_func(x, my_arr--meth(_,1), z)
# versus
my_func(x, my_arr |> x->meth(x,1), y)

Why not

my_arr |> x->meth(x,1) |>  ⬚->my_func(x, ⬚, y)

if you are already chaining anyways?

"uniment:

Or, let’s say you want to access a property after the chain:
x--foo--bar(_,y).a[1]
# versus
(x |> foo |> x->bar(x,y)).a[1]
With the current syntax, you have to go back to the beginning of the pipe chain to parenthesize it, which means you won’t really want to use pipes at all. It’s very discouraging.

Not necessarily, you can just add another step in the chain:

x |> foo |> x->bar(x,y) |> x->x.a[1]

This is also unambiguous, in contrast to bar(_,y).a[1] which depends on operator precedence and could also mean the same as x |> foo |> x->bar(x,y).a[1]. Not saying that your proposal is unclear here, but I would probably need to look it up when using it.

The underscore is not bad, but as you know I would prefer something more mnemonic like ⬚ or so

bertschi · November 19, 2022, 2:40am

That lambda functions are currently not fully optimized is a indeed pity. Such an optimization would provide much more benefits, i.e., regarding functional programming idioms in general, and certainly not just defend the current chaining operator |>.
Efficient lambdas would be nice in general, i.e., for passing around lightweight closures, efficiently composing functions (a la Transformers.jl) etc. In any case, the Julia compiler already has most of the features required to support that already as your typed Fixedstruct shows. As the saying goes

Objects are poor man’s closures. Closures are poor man’s objects.

i.e., using a closure or a struct holding some fixed values is semantically the same, yet with a different synatx and currently different performance implications.

uniment · November 19, 2022, 3:46am

Although the operations on my_arr might be most naturally thought of as a call chain, the call to my_func might not be. I’d rather not be locked into a long chain, if that’s not how it was constructed in my head.

Each chain sends an object through a sequence of transformations. Once an object has taken the correct form, it can then interact with a second object. Just because the first object was created by a sequence of operations, doesn’t mean it’s always useful or meaningful to think of the second object as also being part of the same chain.

This example might make it more clear (envision it with fluent programming):

play(dog1--feed(_, kibbles)--wash, dog2--groom(_, clip_nails=true))

Each Dog object undergoes a chain of operations before interacting with the other.

It could as easily be written as

dog2--groom(_, clip_nails=true)--play(dog1--feed(_, kibbles)--wash, _)

but depending on the context, this might be less natural. Either way, a chain must become a function argument.

The operator precedence of everything is ambiguous until you use it enough I had a tough time of figuring out that the binding of . is the same as []. Or maybe I didn’t, I can’t remember.

Actually, constructing a lambda requires the same time and less memory at compile-time than my Fix object! The drawback is that you have to compile it every time you define a new lambda, even if it’s doing the exact same thing as another lambda, whereas the Fix object can be re-used (without having to name it). The Fix object also comes with a type, which can be used for dispatch and inference.

Even still, though, I’ve become hesitant to compile the Fix object everywhere that chaining would occur; there are too many functions, and too many variations on their argument types and which arguments you could wish to fix, that it feels wasteful to compile all specializations of Fix partial applicators that call chaining would invoke. For a method with n arguments, it has O(2^n) variations on which arguments are fixed. Then, combine that with the variations on argument types. We could remove type-specialization, but then the partial applicator would be type-unstable.

It seems better just to have specialized chaining syntax which makes it clear, that within the context of this chain, the partial applicator doesn’t need to be constructed or compiled and it’s ok to syntax transform it away.

In other words, to use the syntax of partial application, but not the construction of a partial applicator function. It’s transparent to the user anyway; nobody needs to know.

bertschi · November 19, 2022, 11:08am

uniment:

Although the operations on my_arr might be most naturally thought of as a call chain, the call to my_func might not be. I’d rather not be locked into a long chain, if that’s not how it was constructed in my head.

Each chain sends an object through a sequence of transformations. Once an object has taken the correct form, it can then interact with a second object. Just because the first object was created by a sequence of operations, doesn’t mean it’s always useful or meaningful to think of the second object as also being part of the same chain.

This example might make it more clear (envision it with fluent programming):
play(dog1--feed(_, kibbles)--wash, dog2--groom(_, clip_nails=true))
Each Dog object undergoes a chain of operations before interacting with the other.

It could as easily be written as
dog2--groom(_, clip_nails=true)--play(dog1--feed(_, kibbles)--wash, _)
but depending on the context, this might be less natural. Either way, a chain must become a function argument.

Ah, I see … a story reads differently depending on who is telling it:

my_hero--destroy(_, my_asteroid)--fight(_, nasty_monster)--celebrate

my_asteroid--fly(_, space; naively=true)--destroy(my_hero, _)

nasty_monster--disturbed(_; by=my_asteroid)--fight(my_hero, _)

uniment:

Actually, constructing a lambda requires the same time and less memory at compile-time than my Fix object! The drawback is that you have to compile it every time you define a new lambda, even if it’s doing the exact same thing as another lambda, whereas the Fix object can be re-used (without having to name it). The Fix object also comes with a type, which can be used for dispatch and inference.

Even still, though, I’ve become hesitant to compile the Fix object everywhere that chaining would occur; there are too many functions, and too many variations on their argument types and which arguments you could wish to fix, that it feels wasteful to compile all specializations of Fix partial applicators that call chaining would invoke. For a method with n arguments, it has O(2^n) variations on which arguments are fixed. Then, combine that with the variations on argument types. We could remove type-specialization, but then the partial applicator would be type-unstable.

It seems better just to have specialized chaining syntax which makes it clear, that within the context of this chain, the partial applicator doesn’t need to be constructed or compiled and it’s ok to syntax transform it away.

In other words, to use the syntax of partial application, but not the construction of a partial applicator function. It’s transparent to the user anyway; nobody needs to know.

Trade-offs everywhere, i.e., what options do we have here and how do the interact?

Introduce new syntax rules in the parser – might slow it down a bit, but has otherwise no effect on the compiler.
Use a macro which executes at compile time and therefore also increases it. On the other, it prevents the creation of lambdas/Fix objects later on.
Create lambdas/Fix objects and compile each call separately.

Option 1 is probably the cheapest given the current compiler, but requires a change to the parser that cannot be done with user code. I’m not completely sure about the trade-off between 2 and 3 though:

The compiler already specializes on most function calls i.e., handles it differently depending on argument combinations with the same combinatorial explosion you mentioned above, so is the construction of a lambda really so much different from calling a function in the first place?
The compiler already allows user-defined optimization in terms of generated functions, i.e., the Fix objects could be removed again before compiling the actual function call:
```
@generated function Base.|>(x, f::Base.Fix1)
    :(f.f(f.x, x))
end
```
Would that be as fast as using a macro in the first place, i.e., option 2?

uniment · November 19, 2022, 11:11am

I’ve decided that I agree with you, and I’ve updated the spec and examples for block expressions accordingly. ~~In block mode the commas are now optional, similar to how Vectors work.~~

I’m also playing around more with functor composition, and it’s looking super clean.

I've updated the demo code to reflect these changes here, as well as supporting some more (though still not all) broadcasting cases.

see github https://github.com/uniment/ChainingDemo.jl

Here are updated demo examples.

See github https://github.com/uniment/ChainingDemo.jl

Also, here are some new demos that work now:


demo" (1,2,3)--abs2.(_) "
demo" (_+2+3) "
demo" (_-2-3) "
demo" 1--(_+2+3) "
demo" 1--(_-2-3) "
demo" tan(sin(cos(_)))--acos(asin(atan(_))) "
demo" tan(sin(cos(_)))--acos(asin(atan(_))) "(1)
demo" cos(_)--sin--tan--atan--asin--acos "
demo" cos(_)--sin--tan--atan--asin--acos "(1)
demo" --(cos; sin; tan; atan; asin; acos) "
demo" --(cos; sin; tan; atan; asin; acos) "(1)
demo" --cos--sin--tan--atan--asin--acos "
demo" --cos--sin--tan--atan--asin--acos "(1)
# cos--sin--tan--atan--asin--acos doesn't work,
# because sin(::Function) is not defined.
# Should it be though? 🤔 When a method for f(x::T) isn't found,
# if T is a callable type, should f∘x be returned?
demo" (a=1,)--(_.a+1; it+it^2) "
demo" ((1,2),(2,3),(3,4)).--(_[1]^2 + 1) "
demo" 1--((1,2,3)[_+1]) "
demo" (0:10)--filter(isodd, _)--map(_/2+1, _) "
demo" [(a=i,) for i=0:10]--filter(_.a%3==0, _) "
demo"""
"a=1 b=2 c=3"--begin
    split
    it.--(split(_,"="); (it[1]--Symbol => parse(Int,it[2])))
    NamedTuple
end
"""
using DataFrames
df = DataFrame(id=1:100, group=rand(1:5, 100), age=rand(1:30, 100));
demo"""
df--begin
    dropmissing
    filter(:id => (_%5==0), _)
    groupby(_, :group)
    combine(_, :age => sum)
end
"""
demo" (0:0.25:1).--atan(cos(π*_)) "
demo" [1,2,3]--((l=length(it); it); sum(it)/l) "
demo" 1--(it+1, it-2) "
demo" 1--[it+1, it-2] "
demo" 1--Dict(:a=>it+1, :b=>it-2) "
demo" 1--Set((it+1, it-2)) "
demo" (1,2,3)--[i^2 for i ∈ it] "
demo" (1,2,3)--[(i*_)^2 for i ∈ it] "

uniment · November 19, 2022, 11:36am

My test says no:


julia> @generated function Base.:|>(x, f::Base.Fix1)
           :(f.f(f.x, x))
       end

julia> f = (x,y) -> x+y; f(1,2); @time f(1, 2)
  0.000004 seconds
3

julia> f = (x,y) -> x+y; f(1,2); @time 1 |> Base.Fix1(f, 2)
  0.008177 seconds (4.50 k allocations: 271.899 KiB, 98.76% compilation time)
3

julia> f = (x,y) -> x+y; f(1,2); @time demo" 1--f(_, 2) "
  0.000004 seconds
3

bertschi · November 19, 2022, 12:26pm

I’m really not sure about these benchmarks …

julia> @generated function Base.:|>(x, f::Base.Fix1)
                  :(f.f(f.x, x))
              end

julia> f = (x,y) -> x+y; f(1,2);

julia> @time 1 |> Base.Fix1(f, 2)
  0.026216 seconds (4.66 k allocations: 281.242 KiB, 99.03% compilation time)
3

julia> @time 1 |> Base.Fix1(f, 2)
  0.000017 seconds (1 allocation: 16 bytes)
3

julia> using Chain

julia> @time @chain 1 f(2, _)
  0.000010 seconds
3

julia> @time @chain 1 f(2, _)
  0.000008 seconds
3

The Julia compiler is specializing any function, i.e., including macros, generated functions as well as regular functions. I’m not sure how to reliably detect compile times here, i.e., guess the first call of @time 1 |> Base.Fix1(f, 2) is compiling the method of |> and then reusing that specialization on the second call. Also for the @chain macro no compile time is reported even though the macro expansion must formally be done at compile time. Is it really possible to cleanly separate and benchmark compile time vs runtime in Julia?

In any case, the generated function version does allocate though and is therefore not equivalent to a macro, i.e., the constructed Fix object cannot be removed completely by the generated function. The macro obviously avoids creating such an object at all.

jules · November 19, 2022, 1:39pm

That was another reason why I wrote Chain, because you can avoid compilation of Fix instances or anonymous functions or whatever you need to make |> work…

uniment · November 19, 2022, 6:48pm

Indeed, measuring compile time is a difficult thing to do. I detailed my misadventures with it in this post.

The runtime of this function is basically instantaneous, which you can confirm using @btime (a proper measurement will show single-digit nanoseconds)—so even though compile time and runtime are inseparable, the runtime is negligible in this measurement.

I force Base.Fix1 to recompile by redefining the anonymous function every time. You can see that it’s a new function each time:

julia> f = (x,y) -> x+y; (f, typeof(f))
(var"#1#2"(), var"#1#2")

julia> f = (x,y) -> x+y; (f, typeof(f))
(var"#3#4"(), var"#3#4")

julia> f = (x,y) -> x+y; (f, typeof(f))
(var"#5#6"(), var"#5#6")

This makes anonymous functions an excellent stimulus for measuring these compile times, because a new anonymous function will have a different type despite behaving exactly the same as before.

Chain.jl has been a huge inspiration for how this proposal should work. I pretty much copypasted it in imagining the behavior of -- call chain syntax, except that I decided to localize variable scope within the chain using let statements instead of making unique variables at the parent scope with gensym. This enables me to use it as a local keyword, but it makes it difficult to create side-effects except by calling functions which have side-effects.

jar1 · November 19, 2022, 7:20pm

Optional commas is unnecessary and confusing imo. Commas or no commas, not both.

uniment · November 19, 2022, 8:09pm

I somewhat disagree with this, but you made me realize that the behavior I had specified of allowing but not forcing commas in :blocks causes inconsistent behavior when you wish to have a single expression that returns a tuple. So I have no choice but to oblige

What I want is syntax which allows a natural continuation from the inline style to the block style: so that if, in the middle of writing an inline chain, you realize it’ll be better as a multi-line block chain (there’s a joke in here I’m sure), you can do so without deleting a bunch of commas. But, if it started as a multi-line block anyway, you aren’t forced to make sure there’s a comma on every line.

It seems that the best way to do this is simply to use :block. A parenthesized expression with a semicolon like (a; b; c) is parsed as a :block anyway, and if you change it to a begin ... end block the semicolons are optional. I’m updating the spec and code accordingly.

I really liked commas but I think you’re right, from a usability point of view the commas weren’t helpful.

By happy coincidence though, the use of semicolons in a chain like x--(f; g; h) is now more similar to Transducer.jl’s choice to use ⨟ (\bbsemi) as a reverse-composition operator.

This also has the added benefit of making it easier to construct a tuple, e.g.

demo" 1--(it+1, it-2) " # == (2, -1)

uniment · November 20, 2022, 3:44am

Continually updating the demo code here has gotten tedious, so I finally decided to put it in a package on GitHub lol.

uniment · November 21, 2022, 7:04pm

Update: Improved ChainLink pretty-printing, to give a better idea what it does:

julia> demo" --(f; it[1]+it[2]; g(_,1)) "
--(f; #=expr_of_it=#; g(#==#))

Although a ChainLink object stores and runs a lambda (this minimizes compile time, compared with composition), I set a tuple in its type parameterization to give a basic description of the sequence of methods it calls.

julia> demo" --(f; it[1]+it[2]; g(_,1)) " |> typeof
ChainLink{var"#135#136", (:f, Symbol("#=expr_of_it=#"), Symbol("g(#==#)"))}

Of course, expressions of it can take more arbitrary behavior than simple method calls. Best practice would dictate calling named methods whenever possible (i.e., keeping expressions of it as simple and sparse as possible, trying to use them only for minor adjustments to prepare an object for the next method in the chain), so that the ChainLink’s descriptor would be most informative.

Using the baby example from comment #3:

julia> demo" --(pick_up; @assert(it.head > it.legs); put(_.butt, arm); rock) "
--(pick_up; #=expr_of_it=#; put(#==#); rock)

Creating a ChainLink out of the Chain.jl readme example:

julia> demo"""
       --begin
         dropmissing
         filter(:id => >(6), _)
         groupby(_, :group)
         combine(_, :age => sum)
       end"""
--(dropmissing; filter(#==#); groupby(#==#); combine(#==#))

EDIT:

I made it so now the pretty-print shows exactly how the ChainLink is constructed. (Is this a good idea or no? ):

julia> demo" --(f; it[1]+it[2]; g(_,1)) "
--(f; it[1] + it[2]; g(_, 1))

julia> demo" --(f; it[1]+it[2]; g(_,1)) " |> typeof
ChainLink{var"#175#176", (:f, Symbol("it[1] + it[2]"), Symbol("g(_, 1)"))}

julia> demo" --(pick_up; (@assert it.head > it.legs; it); put(_.butt, arm); rock(_, 2)) "
--(pick_up; begin
    #= none:1 =# @assert it.head > it.legs
    #= none:1 =#
    it
end; put(Fix2(getproperty, :butt), arm); rock(_, 2))

julia> demo"""--begin
         dropmissing
         filter(:id => >(6), _)
         groupby(_, :group)
         combine(_, :age => sum)
       end"""
--(dropmissing; filter(:id => (>)(6), _); groupby(_, :group); combine(_, :age => sum))

uniment · November 23, 2022, 1:37am

As an aside, it’s fun to watch the function composition technique cause the called function get completely decomposed.

julia> f(x)=√(2x^2+1) > 5
f (generic function with 1 method)

julia> demo"f(_/2)"
>(_, 5) ∘ sqrt(_) ∘ +(_, 1) ∘ *(2, _) ∘ ^(_, 2) ∘ /(_, 2)

The constraint being, f cannot have any non-function operator (and for my demo, all functions called must be part of a function “whitelist”), and it cannot have x appear more than once.

justinlaughlin · November 23, 2022, 9:36pm

Just wanted to say thanks for all your effort on this @uniment! I’ve been following these threads closely (well as much as I can read through in my free time lol) and hope there is a solution in base Julia soon. I currently use a combination of Lazy @>,@>>, chain (for dataframes), data pipes (although I find the double underscore hard to read), and sometimes even just base Julia (I like the broadcast pipe syntax .|>). I wish I could just stick to one but I always find something ends up kludgy in one system and end up using a different one.

Idk if it’s been mentioned but I like the idea of multiple options even if there is some redundancy rather than one general solution. Eg /> for piping to the last argument and _ operator for general partial function application. I really liked the /> idea (but I think I’d rather have |> still be used for front sub rather than > because it’s not clear to me which is which). This solves the majority of cases (I think the main issue is how map, filter, etc use the itersble as the last argument).

I am still trying to learn the inner workings of Julia, but I’ve tried using fix1 together with map to work around this issue which works but it must be defined before any piping macros (I think because the macros work at compile time?).

If _ is used strictly for currying do you think it’d be reasonable to reserve an unused character for quick lambdas?

aplavin · November 24, 2022, 8:35am

You may find the new DataPipes release useful: I added the broadcast pipe support, after thinking about its expected semantics for some time.

justinlaughlin · November 24, 2022, 1:47pm

I just saw that I’m excited to try it, thanks!

MilesCranmer · November 25, 2022, 5:16am

Hey @uniment, I’ve been following this proposal and like it a lot!

My one piece of feedback would be that the block syntax is too high in (syntax) sugar. It feels weird to give newlines so much power, and seems bound to lead to obfuscated code where it would hard to tell if there is a single multi-line function, or multiple functions being applied in sequence. I would prefer something more explicit like the following be used instead

x = (
    [1, 2, 3]
    --filter(isodd, _)
    --map(_^2, _)
    --sum(it)
    --sqrt(_) 
)
    == 3.1622776601683795

which still feels like the same syntax as the inline examples, whereas -- begin feels like an entirely different thing. i.e., if that works for the -- operator, then perhaps a new user might infer they could do the same thing with any other operator, like

a * begin
    x
    y
    z
end
    == a * x * y * z

(which of course does not work), so it seems unintuitive to only give this special block ability to the -- operator.

Cheers,
Miles

jar1 · November 25, 2022, 5:47am

I like the Chain style newlines, myself.

uniment · November 26, 2022, 3:12am

Same same, as long as it’s good

I’m not sure that there’s a generally better way to define quick lambdas, than the syntax we already have for lambdas. It’s really good syntax. My pet peeve is that we should be using ↦ (\mapsto) for it, as that’s the character we use for it in mathematics.

I also don’t know what characters are unused. Julia has been quite generous in allowing Unicode characters to be valid for variable names and operators, which is awesome, but it comes with the drawback that claiming any of them back for the language would be breaking. For example, ′ (\prime) and the various multi-prime characters, I often think would be better as suffix operators than as part of variable names. Or, of course, \mapsto for lambdas. But that ship sailed long ago.

I do know that † (\dagger) is unused, because it doesn’t parse, but I don’t think we want to use that here.

Topic		Replies	Views
Fixing the Piping/Chaining Issue (Rev 3) Internals & Design multithreading , syntax , piping , chaining , threading	89	7864	April 5, 2024
Fixing the Piping/Chaining Issue Internals & Design proposal , piping , chaining , partial-evaluation , threading	212	7219	January 16, 2023
Summary of piping/chaining proposal? Internals & Design	23	3243	August 13, 2024
Functional application, reconciling currying with multiple dispatch Internals & Design proposal	6	3229	September 19, 2019
Partial Application brackets without underscores Internals & Design proposal , currying , partial-evaluation	5	1624	November 21, 2022

Fixing the Piping/Chaining/Partial Application Issue (Rev 2)

Related topics