Fixing the Piping/Chaining/Partial Application Issue (Rev 2)

I’m probably the worst person to ask for advice on workflow tooling :sweat_smile: but sure! drop me a DM

1 Like

Why not

my_arr |> x->meth(x,1) |>  ⬚->my_func(x, ⬚, y)

if you are already chaining anyways?

Not necessarily, you can just add another step in the chain:

x |> foo |> x->bar(x,y) |> x->x.a[1]

This is also unambiguous, in contrast to bar(_,y).a[1] which depends on operator precedence and could also mean the same as x |> foo |> x->bar(x,y).a[1]. Not saying that your proposal is unclear here, but I would probably need to look it up when using it.

The underscore is not bad, but as you know I would prefer something more mnemonic like or so :wink:

1 Like

That lambda functions are currently not fully optimized is a indeed pity. Such an optimization would provide much more benefits, i.e., regarding functional programming idioms in general, and certainly not just defend the current chaining operator |>.
Efficient lambdas would be nice in general, i.e., for passing around lightweight closures, efficiently composing functions (a la Transformers.jl) etc. In any case, the Julia compiler already has most of the features required to support that already as your typed Fixedstruct shows. As the saying goes

Objects are poor man’s closures. Closures are poor man’s objects.

i.e., using a closure or a struct holding some fixed values is semantically the same, yet with a different synatx and currently different performance implications.

1 Like

Although the operations on my_arr might be most naturally thought of as a call chain, the call to my_func might not be. I’d rather not be locked into a long chain, if that’s not how it was constructed in my head.

Each chain sends an object through a sequence of transformations. Once an object has taken the correct form, it can then interact with a second object. Just because the first object was created by a sequence of operations, doesn’t mean it’s always useful or meaningful to think of the second object as also being part of the same chain.

This example might make it more clear (envision it with fluent programming):

play(dog1--feed(_, kibbles)--wash, dog2--groom(_, clip_nails=true))

Each Dog object undergoes a chain of operations before interacting with the other.

It could as easily be written as

dog2--groom(_, clip_nails=true)--play(dog1--feed(_, kibbles)--wash, _)

but depending on the context, this might be less natural. Either way, a chain must become a function argument.

The operator precedence of everything is ambiguous until you use it enough :wink: I had a tough time of figuring out that the binding of . is the same as []. Or maybe I didn’t, I can’t remember.

Actually, constructing a lambda requires the same time and less memory at compile-time than my Fix object! The drawback is that you have to compile it every time you define a new lambda, even if it’s doing the exact same thing as another lambda, whereas the Fix object can be re-used (without having to name it). The Fix object also comes with a type, which can be used for dispatch and inference.

Even still, though, I’ve become hesitant to compile the Fix object everywhere that chaining would occur; there are too many functions, and too many variations on their argument types and which arguments you could wish to fix, that it feels wasteful to compile all specializations of Fix partial applicators that call chaining would invoke. For a method with n arguments, it has O(2^n) variations on which arguments are fixed. Then, combine that with the variations on argument types. We could remove type-specialization, but then the partial applicator would be type-unstable.

It seems better just to have specialized chaining syntax which makes it clear, that within the context of this chain, the partial applicator doesn’t need to be constructed or compiled and it’s ok to syntax transform it away.

In other words, to use the syntax of partial application, but not the construction of a partial applicator function. It’s transparent to the user anyway; nobody needs to know.

Ah, I see … a story reads differently depending on who is telling it:

my_hero--destroy(_, my_asteroid)--fight(_, nasty_monster)--celebrate

my_asteroid--fly(_, space; naively=true)--destroy(my_hero, _)

nasty_monster--disturbed(_; by=my_asteroid)--fight(my_hero, _)

Trade-offs everywhere, i.e., what options do we have here and how do the interact?

  1. Introduce new syntax rules in the parser – might slow it down a bit, but has otherwise no effect on the compiler.
  2. Use a macro which executes at compile time and therefore also increases it. On the other, it prevents the creation of lambdas/Fix objects later on.
  3. Create lambdas/Fix objects and compile each call separately.

Option 1 is probably the cheapest given the current compiler, but requires a change to the parser that cannot be done with user code. I’m not completely sure about the trade-off between 2 and 3 though:

  • The compiler already specializes on most function calls i.e., handles it differently depending on argument combinations with the same combinatorial explosion you mentioned above, so is the construction of a lambda really so much different from calling a function in the first place?
  • The compiler already allows user-defined optimization in terms of generated functions, i.e., the Fix objects could be removed again before compiling the actual function call:
    @generated function Base.|>(x, f::Base.Fix1)
        :(f.f(f.x, x))
    end
    
    Would that be as fast as using a macro in the first place, i.e., option 2?
1 Like

I’ve decided that I agree with you, and I’ve updated the spec and examples for block expressions accordingly. In block mode the commas are now optional, similar to how Vectors work.

I’m also playing around more with functor composition, and it’s looking super clean.

I've updated the demo code to reflect these changes here, as well as supporting some more (though still not all) broadcasting cases.
see github https://github.com/uniment/ChainingDemo.jl
Here are updated demo examples.
See github https://github.com/uniment/ChainingDemo.jl

Also, here are some new demos that work now:


demo" (1,2,3)--abs2.(_) "
demo" (_+2+3) "
demo" (_-2-3) "
demo" 1--(_+2+3) "
demo" 1--(_-2-3) "
demo" tan(sin(cos(_)))--acos(asin(atan(_))) "
demo" tan(sin(cos(_)))--acos(asin(atan(_))) "(1)
demo" cos(_)--sin--tan--atan--asin--acos "
demo" cos(_)--sin--tan--atan--asin--acos "(1)
demo" --(cos; sin; tan; atan; asin; acos) "
demo" --(cos; sin; tan; atan; asin; acos) "(1)
demo" --cos--sin--tan--atan--asin--acos "
demo" --cos--sin--tan--atan--asin--acos "(1)
# cos--sin--tan--atan--asin--acos doesn't work,
# because sin(::Function) is not defined.
# Should it be though? 🤔 When a method for f(x::T) isn't found,
# if T is a callable type, should f∘x be returned?
demo" (a=1,)--(_.a+1; it+it^2) "
demo" ((1,2),(2,3),(3,4)).--(_[1]^2 + 1) "
demo" 1--((1,2,3)[_+1]) "
demo" (0:10)--filter(isodd, _)--map(_/2+1, _) "
demo" [(a=i,) for i=0:10]--filter(_.a%3==0, _) "
demo"""
"a=1 b=2 c=3"--begin
    split
    it.--(split(_,"="); (it[1]--Symbol => parse(Int,it[2])))
    NamedTuple
end
"""
using DataFrames
df = DataFrame(id=1:100, group=rand(1:5, 100), age=rand(1:30, 100));
demo"""
df--begin
    dropmissing
    filter(:id => (_%5==0), _)
    groupby(_, :group)
    combine(_, :age => sum)
end
"""
demo" (0:0.25:1).--atan(cos(π*_)) "
demo" [1,2,3]--((l=length(it); it); sum(it)/l) "
demo" 1--(it+1, it-2) "
demo" 1--[it+1, it-2] "
demo" 1--Dict(:a=>it+1, :b=>it-2) "
demo" 1--Set((it+1, it-2)) "
demo" (1,2,3)--[i^2 for i ∈ it] "
demo" (1,2,3)--[(i*_)^2 for i ∈ it] "

My test says no:


julia> @generated function Base.:|>(x, f::Base.Fix1)
           :(f.f(f.x, x))
       end

julia> f = (x,y) -> x+y; f(1,2); @time f(1, 2)
  0.000004 seconds
3

julia> f = (x,y) -> x+y; f(1,2); @time 1 |> Base.Fix1(f, 2)
  0.008177 seconds (4.50 k allocations: 271.899 KiB, 98.76% compilation time)
3

julia> f = (x,y) -> x+y; f(1,2); @time demo" 1--f(_, 2) "
  0.000004 seconds
3

1 Like

I’m really not sure about these benchmarks …

julia> @generated function Base.:|>(x, f::Base.Fix1)
                  :(f.f(f.x, x))
              end

julia> f = (x,y) -> x+y; f(1,2);

julia> @time 1 |> Base.Fix1(f, 2)
  0.026216 seconds (4.66 k allocations: 281.242 KiB, 99.03% compilation time)
3

julia> @time 1 |> Base.Fix1(f, 2)
  0.000017 seconds (1 allocation: 16 bytes)
3

julia> using Chain

julia> @time @chain 1 f(2, _)
  0.000010 seconds
3

julia> @time @chain 1 f(2, _)
  0.000008 seconds
3

The Julia compiler is specializing any function, i.e., including macros, generated functions as well as regular functions. I’m not sure how to reliably detect compile times here, i.e., guess the first call of @time 1 |> Base.Fix1(f, 2) is compiling the method of |> and then reusing that specialization on the second call. Also for the @chain macro no compile time is reported even though the macro expansion must formally be done at compile time. Is it really possible to cleanly separate and benchmark compile time vs runtime in Julia?

In any case, the generated function version does allocate though and is therefore not equivalent to a macro, i.e., the constructed Fix object cannot be removed completely by the generated function. The macro obviously avoids creating such an object at all.

1 Like

That was another reason why I wrote Chain, because you can avoid compilation of Fix instances or anonymous functions or whatever you need to make |> work…

3 Likes

Indeed, measuring compile time is a difficult thing to do. I detailed my misadventures with it in this post.

The runtime of this function is basically instantaneous, which you can confirm using @btime (a proper measurement will show single-digit nanoseconds)—so even though compile time and runtime are inseparable, the runtime is negligible in this measurement.

I force Base.Fix1 to recompile by redefining the anonymous function every time. You can see that it’s a new function each time:

julia> f = (x,y) -> x+y; (f, typeof(f))
(var"#1#2"(), var"#1#2")

julia> f = (x,y) -> x+y; (f, typeof(f))
(var"#3#4"(), var"#3#4")

julia> f = (x,y) -> x+y; (f, typeof(f))
(var"#5#6"(), var"#5#6")

This makes anonymous functions an excellent stimulus for measuring these compile times, because a new anonymous function will have a different type despite behaving exactly the same as before.

Chain.jl has been a huge inspiration for how this proposal should work. :ok_hand: I pretty much copypasted it in imagining the behavior of -- call chain syntax, except that I decided to localize variable scope within the chain using let statements instead of making unique variables at the parent scope with gensym. This enables me to use it as a local keyword, but it makes it difficult to create side-effects except by calling functions which have side-effects.

Optional commas is unnecessary and confusing imo. Commas or no commas, not both.

1 Like

I somewhat disagree with this, but you made me realize that the behavior I had specified of allowing but not forcing commas in :blocks causes inconsistent behavior when you wish to have a single expression that returns a tuple. So I have no choice but to oblige :wink:

What I want is syntax which allows a natural continuation from the inline style to the block style: so that if, in the middle of writing an inline chain, you realize it’ll be better as a multi-line block chain (there’s a joke in here I’m sure), you can do so without deleting a bunch of commas. But, if it started as a multi-line block anyway, you aren’t forced to make sure there’s a comma on every line.

It seems that the best way to do this is simply to use :block. A parenthesized expression with a semicolon like (a; b; c) is parsed as a :block anyway, and if you change it to a begin ... end block the semicolons are optional. I’m updating the spec and code accordingly.

I really liked commas :sweat_smile: but I think you’re right, from a usability point of view the commas weren’t helpful.

By happy coincidence though, the use of semicolons in a chain like x--(f; g; h) is now more similar to Transducer.jl’s choice to use (\bbsemi) as a reverse-composition operator.

This also has the added benefit of making it easier to construct a tuple, e.g.

demo" 1--(it+1, it-2) " # == (2, -1)
1 Like

Continually updating the demo code here has gotten tedious, so I finally decided to put it in a package on GitHub lol.

7 Likes

Update: Improved ChainLink pretty-printing, to give a better idea what it does:

julia> demo" --(f; it[1]+it[2]; g(_,1)) "
--(f; #=expr_of_it=#; g(#==#))

Although a ChainLink object stores and runs a lambda (this minimizes compile time, compared with composition), I set a tuple in its type parameterization to give a basic description of the sequence of methods it calls.

julia> demo" --(f; it[1]+it[2]; g(_,1)) " |> typeof
ChainLink{var"#135#136", (:f, Symbol("#=expr_of_it=#"), Symbol("g(#==#)"))}

Of course, expressions of it can take more arbitrary behavior than simple method calls. Best practice would dictate calling named methods whenever possible (i.e., keeping expressions of it as simple and sparse as possible, trying to use them only for minor adjustments to prepare an object for the next method in the chain), so that the ChainLink’s descriptor would be most informative.

Using the baby example from comment #3:

julia> demo" --(pick_up; @assert(it.head > it.legs); put(_.butt, arm); rock) "
--(pick_up; #=expr_of_it=#; put(#==#); rock)

Creating a ChainLink out of the Chain.jl readme example:

julia> demo"""
       --begin
         dropmissing
         filter(:id => >(6), _)
         groupby(_, :group)
         combine(_, :age => sum)
       end"""
--(dropmissing; filter(#==#); groupby(#==#); combine(#==#))

EDIT:

I made it so now the pretty-print shows exactly how the ChainLink is constructed. (Is this a good idea or no? :thinking:):

julia> demo" --(f; it[1]+it[2]; g(_,1)) "
--(f; it[1] + it[2]; g(_, 1))
julia> demo" --(f; it[1]+it[2]; g(_,1)) " |> typeof
ChainLink{var"#175#176", (:f, Symbol("it[1] + it[2]"), Symbol("g(_, 1)"))}
julia> demo" --(pick_up; (@assert it.head > it.legs; it); put(_.butt, arm); rock(_, 2)) "
--(pick_up; begin
    #= none:1 =# @assert it.head > it.legs
    #= none:1 =#
    it
end; put(Fix2(getproperty, :butt), arm); rock(_, 2))
julia> demo"""--begin
         dropmissing
         filter(:id => >(6), _)
         groupby(_, :group)
         combine(_, :age => sum)
       end"""
--(dropmissing; filter(:id => (>)(6), _); groupby(_, :group); combine(_, :age => sum))

As an aside, it’s fun to watch the function composition technique cause the called function get completely decomposed.

julia> f(x)=√(2x^2+1) > 5
f (generic function with 1 method)

julia> demo"f(_/2)"
>(_, 5) ∘ sqrt(_) ∘ +(_, 1) ∘ *(2, _) ∘ ^(_, 2) ∘ /(_, 2)

The constraint being, f cannot have any non-function operator (and for my demo, all functions called must be part of a function “whitelist”), and it cannot have x appear more than once.

Just wanted to say thanks for all your effort on this @uniment! I’ve been following these threads closely (well as much as I can read through in my free time lol) and hope there is a solution in base Julia soon. I currently use a combination of Lazy @>,@>>, chain (for dataframes), data pipes (although I find the double underscore hard to read), and sometimes even just base Julia (I like the broadcast pipe syntax .|>). I wish I could just stick to one but I always find something ends up kludgy in one system and end up using a different one.

Idk if it’s been mentioned but I like the idea of multiple options even if there is some redundancy rather than one general solution. Eg /> for piping to the last argument and _ operator for general partial function application. I really liked the /> idea (but I think I’d rather have |> still be used for front sub rather than > because it’s not clear to me which is which). This solves the majority of cases (I think the main issue is how map, filter, etc use the itersble as the last argument).

I am still trying to learn the inner workings of Julia, but I’ve tried using fix1 together with map to work around this issue which works but it must be defined before any piping macros (I think because the macros work at compile time?).

If _ is used strictly for currying do you think it’d be reasonable to reserve an unused character for quick lambdas?

1 Like

You may find the new DataPipes release useful: I added the broadcast pipe support, after thinking about its expected semantics for some time.

1 Like

I just saw that I’m excited to try it, thanks! :grin:

Hey @uniment, I’ve been following this proposal and like it a lot!

My one piece of feedback would be that the block syntax is too high in (syntax) sugar. It feels weird to give newlines so much power, and seems bound to lead to obfuscated code where it would hard to tell if there is a single multi-line function, or multiple functions being applied in sequence. I would prefer something more explicit like the following be used instead

x = (
    [1, 2, 3]
    --filter(isodd, _)
    --map(_^2, _)
    --sum(it)
    --sqrt(_) 
)
    == 3.1622776601683795

which still feels like the same syntax as the inline examples, whereas -- begin feels like an entirely different thing. i.e., if that works for the -- operator, then perhaps a new user might infer they could do the same thing with any other operator, like

a * begin
    x
    y
    z
end
    == a * x * y * z

(which of course does not work), so it seems unintuitive to only give this special block ability to the -- operator.

Cheers,
Miles

I like the Chain style newlines, myself.

1 Like