Fixing the Piping/Chaining/Partial Application Issue (Rev 2)

Starting this thread anew to avoid confusion, because I changed my mind on which approach I support partway through a long thread (sorry!). It’s also useful to distill some of the last thread’s learnings.

Thank you to all who participated—especially those who rightfully criticized my ignorance! Many issues are matters of genetic algorithm or simulated annealing, and ossified entrenched problems aren’t solved without some heat.

Please grab a cup of coffee, pull up a chair, and bring a fresh pair of eyes to consider this (refreshed!) proposal. Or for the impatient, skip to the demo at the end and start playing with it.

JóreggeltGIF

Objective


To solve Julia’s piping/chaining/currying/partial application problem with an elegant and functional approach worthy and idiomatic of Base Julia (i.e., not constrained to macro calls).

Background


See the previous proposal for background information.

To summarize why I changed my mind:

Proposal


I am making essentially two proposals: 1.) partial function application syntax, and 2.) call chaining syntax.

PAS

I now support PR #24990 with some minor modifications. #24990 is Scala-style “tight-currying” underscore syntax. I will refer to this as “Partial Application Syntax,” although my informal name for it is “basically *the* perfect syntax for partial application.” Quoting @stevengj from the PR, with my modifications in italics:

  1. The currying is “tight”, i.e. it only converts the immediately surrounding function call into a lambda partial application functor, as suggested by @JeffBezanson (and as in Scala). So, e.g. f(g(_,y)) is equivalent to behaves like f(x -> g(x,y)). (Note that something like find(!(_ in c), y) will work fine, because the ! operator works on functions; you can also use _ ∉ c.) Any other rule seems hard to make comprehensible and consistent.

  2. Similar to Scala, multiple underscores are converted into multiple arguments in the order they appear. e.g. f(_,y,_) is equivalent to behaves like (x,z) -> f(x,y,z) .

  3. A slurping underscore is converted into varargs, for example f(a,_...,z) behaves like
    (args...)->f(a,args...,z). (note that only one slurp is allowed per expression)

  4. Use . to denote broadcasting, e.g. f.(x,_,z) is equivalent to broadcast(f,x,_,z), and _.^2 is equivalent to broadcast(^, _, 2).

  5. For the special case of denoting an expression that will become a zero-argument function due to fixing all arguments, use &_. For example, g=f(x,y,&_); g()
    Note: I have no preference for the specific character sequence; I choose it because it parses.

  6. The object returned by the underscore syntax is a parametrically-typed Fix partial application functor which allows for type inference, dispatch, and pretty-printing. For example, f(x,_,z) is equivalent to Fix{(1,3),3}(f, x, z). This resolves #36181. See “Demo” for sample code.

I also add a proposal specific to chaining:

CCS

A function chaining syntax, which I will call “Call Chain Syntax,” which gives preferred treatment to PAS (the syntax, not the functor object) by suppressing applicator construction using syntax transforms for better compile time and by splatting automatically, and which also allows for “chain glue expressions” using a local keyword it (this will become more clear later).

  1. The infix character sequence -- , with the same “operator precedence” as property getting . dot (17 as of writing), which takes on its R.H.S. either a callable object, or an expression of it, or a :block of callable objects inclusive-or expressions of it, and calls them sequentially. For example:
    x--f is equiv to let it=x; it=f(it) end,
    x--f(_,y) is equiv to let it=x; it=f(it,y) end,
    y--f.(x,_) is equiv to let it=y; it=f.(x,it) end,
    x--f(_^2,_).a[2] is equiv to (let it=x; it=f(_^2,it) end).a[2],
    x--(f; it^2+it; it/5) is equiv to
    let it=x; it=f(it); it=it^2+it; it=it/5 end, and
    x--begin f; g; h end is equiv to x--f--g--h,
    which is equiv to x--(f; g; h), which is equiv to
    let it=x; it=f(it); it=g(it); it=h(it) end.
    Note: I choose -- because it looks neat, is easy to type, and is currently invalid syntax so it won’t be piracy to claim it.
    Note: Expressions are considered to be expressions of it if they have it anywhere, *unless* it exists only locally within nested call chains.

  2. (This should satisfy Chain.jl users:) Within the scope of a R.H.S. expression, a keyword it is defined locally, representing the result of executing the previous element in the chain. Expressions of it capture everything that has tighter binding than the block expression delimiter. For example:
    x--(f; it.a+it.b; g) is equiv to
    let it=x; it=f(it); it=it.a+it.b; it=g(it) end.
    Note: I prefer a keyword like it to avoid confusion with the partial function placeholder _, as they have distinct meanings. Using it leverages the fact that the English language already uses the pronoun “it” for the exact same purpose of method chaining.

  3. If the next expression is a PAS expression with multiple _ underscores, then the previous expression will be splatted in.
    x--(f; g(_,y,_...)) is equiv to
    let it=x; it=f(it); it=g(it[1],y,it[2:end]...) end.

  4. To broadcast on a callable object or an expression of it , or a :block of such, use .--. For example,
    x.--(f; it.a[1]) evaluates like
    let it=x; it=f.(it); it=(it->it.a[1]).(it) end.

  5. A callable type Base.ChainLink, representing a sequence of operations.

  6. If -- or .-- is “headless”, i.e., prefix, having no object on its L.H.S., then it behaves as a constructor for a Base.ChainLink with the aforementioned behaviors (notably w.r.t. splatting and the it keyword). For example, if we let
    foo = --(f; it.a+it.b; g), this gives a Base.ChainLink object that can be called like
    it -> begin it=f(it); it=it.a+it.b; it=g(it) end. Then,
    x--foo is equiv to foo(x), which evaluates like the example from #8.
    Similarly, a broadcasting chainlink can be defined by:
    baz = .--(f; it.a+it.b; g), which calls like
    it -> begin it=f.(it); it=(it->it.a+it.b).(it); it=g.(it) end.
    Because Base.ChainLink is a callable object, it can also be inserted into other chainlinks. For example:
    bar = --(h; foo); y--bar.

  7. For Chain.jl users: an @aside macro is not required. For example:.
    x--(f; (println(it); it); g) yields g(f(x)) and prints f(x).

  8. (Stretch Goal) Special handling for the do statement, when taking -- as an “argument name.” Example:
    map(x) do -- f; it + it^2; g end would be equiv to
    map(--(f; it + it^2; g); x)

[Some] Details


Fix Functor

Latest edition of code for a generalized Fix partial application functor is in “Demo.”

This Fix functor type generalizes the behavior of Base.Fix1 and Base.Fix2 to fix any number of arguments in any position, for functions with fixed or variable argument lengths, and keyword arguments. With this functor, the behavior of Base.Fix1 and Base.Fix2 as currently defined for 2-argument functions is captured by the parametric types:

const Fix1{F,X} = Fix{F,(1,),2,Tuple{X},NamedTuple{(),Tuple{}}}
const Fix2{F,Y} = Fix{F,(2,),2,Tuple{Y},NamedTuple{(),Tuple{}}}

It’s also easy to make functors accepting variable argument lengths:

const FixFirst{F,X} = Fix{F,(1,),0,Tuple{X}}
const FixLast{F,X} = Fix{F,(-1,),0,Tuple{X}}

as well as any other argument position and count.

To call such an object without syntax sugar:

[1,2,3] |> Fix1(map, Fix2(^,2))
[1,2,3] |> FixLast(join, ", ")
[1,2,3] |> Fix{(1,2),3}(mapreduce, Fix2(^, 2), +)

Using underscore and chaining syntax:

[1,2,3] -- map(_^2, _)
[1,2,3] -- join(_..., ", ")
[1,2,3] -- mapreduce(_^2, +, _)

Modified Underscore syntax

In alignment with @jeff.bezanson and @stevengj, I think the “tight currying” approach inspired by Scala, where _ is used strictly for partial function evaluation (and not for constructing arbitrary lambdas), is the only acceptable approach for implementing underscore syntax as a language feature—that is to say, it’s sufficiently generic, straightforward to reason about, and useful to justify incorporation into the language proper.

Furthermore, considering how frequently partial application is employed, it’s sensible to add syntax sugar for it (as we already have for lambdas).

The modifications I propose to the original syntax of #24990 are simple: _... for slurping varargs, &_ for the special case of creating a zero-argument functor, and the expected broadcasting behavior using .. With these additions, underscore syntax becomes fully general, able to create partial (and full) applicators fixing any number of arguments in any position for any function that the language permits.

Call Chain Syntax: --

The situations where it isn’t needed to compile the functor are those occasions when it would be defined and immediately called and discarded, i.e., when using it as part of a call chain. The proposed call chain syntax -- would invoke a syntax transformation such that the constructor and functor of the Fix partial function applicator needn’t be compiled, enhancing the utility of underscore syntax.

In addition, the chaining syntax allows for tuples of arguments to be “splatted” into whichever argument positions are desired, as directed by underscore syntax.

Call chain syntax has been specified to operate with the same precedence as . property getting. This is so that, for example, a chain like this:

"1"--parse(Int,_) == 1

is evaluated as

parse(Int,"1") == 1

instead of

(parse(Int,_) == 1)("1")

Importantly, it’s so that the result of a chain can have its properties and indices accessed without having to parenthesize the entire call chain.

A sequence of operations that are devoid of the object they will operate on I call a Base.ChainLink, and are constructed with “headless” -- call chain syntax. Of course, a new chainlink can be created by composing other chainlinks. Ultimately, a full chain requires the source object too.

Although there’s heavy overlap between the functionality of the pipe operator |> and call chain syntax --, I suspect the latter will find greater use, as the former is notoriously inconvenient to type (among other reasons).

Examples


Starting a Spark Session


SparkSession.builder--appName(_, "Main")--master(_, "local")--getOrCreate

Chaining


# notice that `(_)` and `(it)` cause no harm

[1, 2, 3] -- begin 
    filter(isodd, _)
    map(_^2, _)
    sum(it)
    sqrt(_) 
end ==
    3.1622776601683795

More


"1 2, 3; hehe4"--eachmatch(r"(\d+)", _).--(first; parse(Int, _))--join(_, ", ") ==
    "1, 2, 3, 4"

Replacing Base.Fix1 and Base.Fix2


Base.values(x::MyStruct{K1,<:Any,K2,<:Any}) where {K1,K2} = 
    (values(x.a)..., map(getfield(x.b, _), filter(_∉K1, K2))...)

Function Composition


chain = --(split(_, r",\s*"); .--(parse(Int, it)^2); join(_, ", "))
"1, 2, 3, 4"--chain ==
    "1, 4, 9, 16" 

Fluent Interface


Kitten()--(setname(_,"Salem"), setcolor(_,:black), save)

Transducer Chainlink


process_bags = --begin
    mapcatting(unbundle_pallet)
    filtering(is_nonfood)
    mapping(label_heavy)
end
process_bags--into(airplane, _, pallets)

Splatting Multiple Arguments


(_^2, [1,2,3])--mapreduce(_, +, _) ==
    14

Rearranging before splatting


(:a,:b)--(reverse; f(_,_))

Fully-Applied “Partially Applied” Function


run = println("Hello", "world!", &_)

run()

Example from Chain.jl Readme

df--begin
    dropmissing
    filter(:id => >(6), _)
    groupby(_, :group)
    combine(_, :age => sum)
end

Example from DataPipes.jl Readme

"a=1 b=2 c=3"--begin
    split
    .--(split(_, "="); Symbol(it[1]) => parse(Int, it[2]))
    NamedTuple
end

New Pushback & Responses


In your examples you still have quite a lot of _ underscores. How is this better than xyz.jl chaining package?

Yes there are a handful more characters here and there, but it adds legibility and obviousness. It also makes use of existing idioms, which I think is a good thing.

Recall an important constraint: to settle on generic syntax for the language proper, not for a domain-specific package.

When things are a proper part of the language, then people can start to justify making tooling and tab-autocompletes for them. Hence, I don’t anticipate the extra _'s will be a nuisance in the near future.

This is sugar. Sugar belongs in macros.

Why? Pretty much everything in a language beyond parentheses is sugar; even infix operators are sugar.

Sugar improves productivity, and it encourages and promotes preferred idioms. Partial application is a good idiom to prefer in a functional language. And method chaining is a preferred idiom, well, everywhere (including in spoken languages and in mathematics).

Partial application and function call chaining are two separate concepts, and should not be in the same proposal together.

This is probably correct. I don’t know exactly when or why the two concepts got conflated, but they did—at least for me. Maybe they are just concepts that are easily conflatable?

Why do you mention autocomplete? Your proposal doesn’t solve any of the problems of autocomplete.

I disagree. There are technical challenges, but I think the key issue is a matter of time and motivation. And motivation is helped by sugar that encourages the use of syntax that an autocomplete can work smoothly with.

The IDE doesn’t know what type of variable you’re operating with, so it’s impossible to find what methods will operate on it.

This would be a reasonable statement, if we decreed that nobody could run Julia in interactive mode.

But that isn’t the case. Julia’s meant to be an interactive-mode language that runs fast: a scripting language that compiles. We already have property . autocomplete, but we don’t have method autocomplete. That should change.

And someday, somebody will implement an autocomplete that works outside interactive mode too. But we should crawl before we walk.

Can you give an example of how you imagine autocomplete might work?

Of course! Here’s a walked-through step-by-step example.

Can you provide sample code of how this autocomplete might work?

Yup! I made some sample code here.

And here is how to use it.

Autocomplete is rendered useless by Julia’s method genericism; showing a method list sorted by type specialization is a bad idea, because so many methods are generic to Any type.

I disagree, but I think this too is a solvable problem.

Some more thoughts on how to do it.

It’s impossible to reach consensus on such a contentious topic; we will never implement it into the language.

This is a strangely defeatist attitude for a language whose inspiration is to “have it all.” :sweat_smile:

In my estimation, we need to settle on a syntax that’s generic enough, simple enough, and powerful enough, and remind ourselves that a good language feature will do something simple and natural, and do it very well, so that it can compose well with other language features. I’m hoping this proposal satisfied those objectives.

Demo


Of course, this wouldn’t be complete without a demo!

This is a hacky demo, but it should do the trick.

NOTE: I cannot update to my latest code, because I hit the character limit. See comment #32 for latest code and demos.

Note: The Fix functors are pretty solid, but macro demo_str is hacky like hacky sack.

  • It currently doesn’t work with broadcasting chainlinks (i.e., headless .--)
  • Broadcasting on functions sometimes works and sometimes doesn’t
  • The operator precedence of -- in this demo is similar to multiplication (12), but it’s intended that it should have precedence similar to the . dot operator (17).

Code:


*** DELETED; SEE COMMENT 32 ***

Fun examples to try

NOTE: I cannot update to my latest code, because I hit the character limit. See comment #32 for latest code and demos.


*** DELETED; SEE COMMENT 32 ***

Play with it, compare with the dozen or so chaining packages out there, and let me know your thoughts! :thought_balloon:

11 Likes

Fix partial applicator simple runtime benchmark (optimized for 2-args):


julia> using FixArgs: Fix1 as Fix1a # for comparison w/ FixArgs.jl

julia> Fix1b = (f,x)->y->f(x,y); # for comparison w/ lambda

julia> f=(x,y)->x+y; f(1,2); g=Fix1(f, 1); @btime $g(2);
  1.800 ns (0 allocations: 0 bytes)

julia> f=(x,y)->x+y; f(1,2); g=Base.Fix1(f, 1); @btime $g(2);
  1.800 ns (0 allocations: 0 bytes)

julia> f=(x,y)->x+y; f(1,2); g=Fix1a(f, 1); @btime $g(2);
  1.800 ns (0 allocations: 0 bytes)

julia> f=(x,y)->x+y; f(1,2); g=Fix1b(f, 1); @btime $g(2);
  1.800 ns (0 allocations: 0 bytes)

julia> f=(x,y)->x+y; f(1,2); g=y->f(1, y); @btime $g(2); # type-unstable
  19.157 ns (0 allocations: 0 bytes)

Fix partial applicator simple compile time benchmark (optimized for 2-args):


julia> f=(x,y)->x+y; f(1,2); @time g=Fix1(f, 1); @time g(2);
  0.002998 seconds (1.98 k allocations: 135.662 KiB, 92.55% compilation time)
  0.002270 seconds (3.18 k allocations: 199.320 KiB, 97.75% compilation time)

julia> f=(x,y)->x+y; f(1,2); @time g=Base.Fix1(f, 1); @time g(2);
  0.002760 seconds (1.32 k allocations: 79.189 KiB, 97.32% compilation time)
  0.002505 seconds (2.39 k allocations: 146.041 KiB, 98.93% compilation time)

julia> f=(x,y)->x+y; f(1,2); @time g=Fix1a(f, 1); @time g(2);
  0.006897 seconds (9.29 k allocations: 557.034 KiB, 96.58% compilation time)
  0.004007 seconds (16.25 k allocations: 928.998 KiB, 99.33% compilation time)

julia> f=(x,y)->x+y; f(1,2); @time g=Fix1b(f, 1); @time g(2);
  0.003930 seconds (657 allocations: 38.885 KiB, 99.45% compilation time)
  0.003387 seconds (801 allocations: 46.613 KiB, 99.15% compilation time)

julia> f=(x,y)->x+y; f(1,2); @time g=y->f(1, y); @time g(2);
  0.000034 seconds (25 allocations: 1.609 KiB)
  0.002801 seconds (459 allocations: 29.776 KiB, 99.20% compilation time)
2 Likes

Comparing this proposal to the syntax from the better-known chaining and piping packages:


Example from Chain.jl Readme:

Proposed CCS+PASBase Julia
df--begin
  dropmissing
  filter(:id => >(6), _)
  groupby(_, :group)
  combine(_, :age => sum)
end
df |>
  dropmissing |>
  x -> filter(:id => >(6), x) |>
  x -> groupby(x, :group) |>
  x -> combine(x, :age => sum)
Chain.jlDataPipes.jl
@chain df begin
  dropmissing
  filter(:id => >(6), _)
  groupby(:group)
  combine(:age => sum)
end
@p begin
  df
  dropmissing
  filter(:id => >(6), x)
  groupby(_, :group)
  combine(_, :age => sum)
end
Pipe.jl Lazy.jl
@pipe df |>
  dropmissing |>
  filter(:id => >(6), _)|>
  groupby(_, :group) |>
  combine(_, :age => sum)
@> df begin
  dropmissing
  x -> filter(:id => >(6), x)
  groupby(:group)
  combine(:age => sum)
end
Underscores.jl Hose.jl
@_ df |>
  dropmissing |>
  filter(:id => >(6), __) |>
  groupby(__, :group) |>
  combine(__, :age => sum)
@hose df |>
  dropmissing |>
  filter(:id => >(6), _) |>
  groupby(_, :group) |>
  combine(_, :age => sum)

Example from DataPipes.jl Readme

Proposed CCS+PASBase Julia
"a=1 b=2 c=3"--begin
  split
  map(_) do --
    split(_, "=")
    Symbol(it[1]) => parse(Int, it[2])
  end
  NamedTuple
end
"a=1 b=2 c=3" |>
  split |>
  it->map(it) do it
    it=split(it, "=")
    Symbol(it[1]) => parse(Int, it[2])
  end |>
  NamedTuple
Chain.jlDataPipes.jl
@chain "a=1 b=2 c=3" begin
  split
  map(_) do it
    it=split(it, "=")
    Symbol(it[1]) => parse(Int, it[2])
  end
  NamedTuple
end
@p let
  "a=1 b=2 c=3"
  split
  map() do __  
    split(__, '=')
    Symbol(__[1]) => parse(Int, __[2])
  end
  NamedTuple
end
Pipe.jl Lazy.jl
@pipe "a=1 b=2 c=3" |>
  split |>
  map(_) do it
    it=split(it, "=")
    Symbol(it[1]) => parse(Int, it[2])
  end |>
  NamedTuple
@>> "a=1 b=2 c=3" begin
  split
  map(it->begin
    it=split(it, "=")
    Symbol(it[1]) => parse(Int, it[2])
  end)
  NamedTuple
end
Underscores.jl Hose.jl
@_ "a=1 b=2 c=3" |>
  split |>
  map(it->begin
    it=split(it, "=")
    Symbol(it[1]) => parse(Int, it[2])
  end, __) |>
  NamedTuple
@hose "a=1 b=2 c=3" |>
  split |>
  map(_) do it
    it=split(it, "=")
    Symbol(it[1]) => parse(Int, it[2])
  end |>
  NamedTuple

Examples from Pipe.jl Readme

Proposed CCS+PASBase Julia
a--b(_...)
a--b(it(1,2))
a--b(it[3])
(2,4)--get_angle(_,_)
a |> x->b(x...)
a |> x->b(x(1,2))
a |> x->b(x[3])
(2,4) |> x->get_angle(x[1],x[2])
Chain.jlDataPipes.jl
@chain a b(_...)
@chain a b(_(1, 2))
@chain a b(_[3])
@chain (2,4) get_angle(_[1],_[2])
@p a b(__...)
@p a b(__(1,2))
@p a b(__[3])
@p (2,4) get_angle(__[1], __[2])
Pipe.jl Lazy.jl
@pipe a |> b(_...)
@pipe a |> b(_(1, 2))
@pipe a |> b(_[3])
@pipe (2,4) |> get_angle(_[1],_[2])
# N/A
# N/A
# N/A
# N/A
Underscores.jl Hose.jl
@_ a |> b(__...)
@_ a |> b(__(1,2))
@_ a |> b(__[3])
@_ (2,4) |> get_angle(__[1],__[2])
@hose a |> b(_...)
@hose a |> b(_(1,2))
@hose a |> b(_[3])
@hose (2,4) |> get_angle(_[1],_[2])

My Examples

Proposed CCS+PASBase Julia
[1,2,3]--map(_^2, _)
[1,2,3]--join(_, ", ")
"1"--parse(Int, _) == 1
(:a,:b)--reverse--f(_...)
[1,2,3] |> x->map(x->x^2, x)
[1,2,3] |> x->join(x, ", ")
([1,2,3] |> x->parse(Int, x)) == 1
(:a,:b) |> reverse |> x->f(x...)
Chain.jlDataPipes.jl
@chain [1,2,3] map(x->x^2, _)
@chain [1,2,3] join(", ")
@chain("1", parse(Int, _)) == 1
@chain (:a,:b) reverse f(_...)
@p [1,2,3] map(_^2)
@p [1,2,3] join(__, ", ")
@p("1", parse(Int)) == 1
@p (:a,:b) reverse f(__...)
Pipe.jl Lazy.jl
@pipe [1,2,3] |> map(x->x^2, _)
@pipe [1,2,3] |> join(_, ", ")
@pipe("1" |> parse(Int, _)) == 1
@pipe (:a,:b) |> reverse |> f(_...)
@>> [1,2,3] map(x->x^2)
@> [1,2,3] join(", ")
@>>("1", parse(Int)) == 1
# N/A
Underscores.jl Hose.jl
@_ [1,2,3] |> map(_^2, __)
@_ [1,2,3] |> join(__, ", ")
@_("1" |> parse(Int, __)) == 1
@_ (:a,:b) |> reverse |> f(__...)
@hose [1,2,3] |> map(x->x^2, _)
@hose [1,2,3] |> join(_, ", ")
@hose("1" |> parse(Int, _)) == 1
@hose (:a,:b) |> reverse |> f(_...)

Of interest:

Many of these packages implement single-underscore _ and double-underscore __, each with a meaning, if not equal to, approximating:

  1. Placeholder to specify argument position for partial function evaluation (“tight currying”)
  2. Return value of last element in execution chain (“loose binding”)

It’s fairly confusing to identify which is which, because they seem to have similar meanings in these contexts, they look almost identical, and different packages swap their symbols or give them slightly different meanings.

In this proposal, to avoid confusing the two, I call #2 it reflecting the analogous role of the pronoun in the English language for method chaining, and I leave #1 as the most basic Scala-style argument placeholder _ for partial function evaluation.

My proposal intends to keep the behavior rules as simple and consistent as possible, to create a syntax that composes well, instead of fragile and complicated rules.


Some color on the word “it”

Method chaining is a common idiom in natural language too. In some instances, a sequence of methods is directly composable. For example (in pseudo-English):

Cat: Put on lap. Inspect fur. Find flea. Pull off. Put in soapy water.

In these instances, the pronoun “it” can be implied. In other instances, when methods are not directly composable and minor glue logic is necessary to ready an object for the next step in the chain, we must make “it” explicit:

Baby: Pick up. Lift its head above its legs. Put butt on your arm. Rock to sleep.

Notice that without the glue employing “it,” the composition might not work very well.

The call chain syntax I propose here intends to handle these cases, and uses the same keyword “it” for the exact same reasons.

Of course, there are more sophisticated scenarios where each object must be specified at each step:

Pot, oats, milk: Put the pot on the stove. Put oats in the pot. Pour milk in the pot. Turn on the stove.

For these more general (and more verbose) scenarios, we already have lambdas and named functions.

Chain.jl makes the _, unnecessary in @chain d groupby(_, c) but it is still necessary under --. Why is -- better?

1 Like

See:

It’s possible to specify -- to operate more like @chain, and to default to inserting the argument in the front of the argument list. However, this creates inconsistent behavior, increasing mental load and making the operator’s behavior more fragile.

For a language feature, you want behavior to be simple and consistent, so that it will compose well. Sometimes this will be annoying because it’s a little bit more verbose, but sometimes it can really save your bacon.

For example: What if one day, you decide that you want a part of your chain to be a return value from another function call? Using --, you would do something like this:

x--(f, my_function_generator(arg1, arg2), g)

With @chain’s behavior to automatically insert it into first argument position, this becomes treacherous; you basically have to abandon all of the investment you put into learning its idioms, simply because it did not anticipate your new use case.

So it’s better to have more simple, generic behavior, which partial application syntax is. Tab-autocomplete will make the experience even smoother, once we have it.

1 Like

A Fun (and productive) Experiment

A repeated complaint about “tight-currying” is that it binds too tightly for a lot of the common expressions we might wish for. For example, how do we express 1 + x - 2?

julia> demo" 1 + _ - 2 "
ERROR: MethodError: no method matching -(::Fix1{typeof(+), Int64}, ::Int64)

Oh no! An error! That’s because “tight-currying” was greedy, and now we’re trying to subtract an integer from a function instead of just having a lambda that represents the entire expression.

But remember the point of simple syntax with consistent rules, is that it’s composable with other syntax and operators. So maybe we can solve the problem with… wait for it… composition! (ba-dum-tsss)

julia> Base.:-(x::Fix, y) = Fix2(-, y) ∘ x

julia> demo" 1 + _ - 2 "
-(_, 2) ∘ +(1, _)

julia> demo" 1 + _ - 2 "(3)
2

Yay!

Here’s another example: \sin(\cos(x)):

julia> Base.sin(x::Fix) = sin ∘ x

julia> demo" sin(cos(_)) "
sin ∘ cos(_)

julia> demo" sin(cos(_)) "(1)
0.5143952585235492

Composing tight currying with function composition. A thing of beauty.

Now, because any object can be a function, maybe this behavior shouldn’t be constrained to just Fix objects…

BatmanThinkingGIF

If nothing else though, the type Fix makes it explicit that it’s a partial function rather than any arbitrary function or object, and therefore this behavior could be intended, so it seems safe enough to explore.

With @chain’s behavior to automatically insert it into first argument position, this becomes treacherous; you basically have to abandon all of the investment you put into learning its idioms, simply because it did not anticipate your new use case.

It’s just

@chain x f my_function_generator(arg1, arg2)(_) g
2 Likes

Hah! I didn’t think of that one :stuck_out_tongue_winking_eye: You might consider removing the underscore there too, for legibility.

To answer this question more directly:

I could have specified that -- would behave as @chain does. However, for a generic language feature, the inconsistency in behavior has a bad feel to it, and when autocomplete is eventually available it will feel short-sighted. I did not feel this way toward my previous proposal, because although it showed a clear preference to the first argument too, it was very simple and always behaved that way. I have a strong preference for consistency, transparency, and obviousness in behavior of language features.

I think the real benefits of --, however, lie in its other properties. Namely,

  1. By (aspiring to be) a language feature, people could unify their efforts to create tooling (e.g., tab autocomplete). This is difficult when you have a patchwork of different macros.
  2. By being an infix operator, you don’t have to skip back when you remember you want to call a function on the object; you just do it. This reduces mental load.
  3. By having high precedence (similar to . dot property access), you don’t have to parenthesize the entire expression when you wish to get a property or an index of the result. The benefits of this are difficult to appreciate from the examples, because they never show what happens next after the chain :sweat_smile: But in short, it makes it much more convenient when the chain is an inline expression. The intent is for you to prefer -- over regular function call syntax, which you will when autocomplete eventually comes online.
  4. -- chains and chainlinks can be nested.
  5. Chainlinks (created from “headless” --) can be saved and reused, and chains can be broadcasted with .--.
  6. The use of it means that _ can be reserved for partial function application, which I believe is a more suitable role.
  7. The insistence on a :tuple of expressions means that links in the chain are separated by commas, which is more consistent with them being functions or expressions but not full statements. With @chain, it’s a bit less clear what the answer to the question is, of “What is the object I am now dealing with?” nevermind lol
  8. Just a restatement of #1. It should be a language feature, not a macro.

Hope that covers it :wink:

I guess I can buy the story about not wanting to prefer the first argument. That said, it’s gonna be the first or the last in 95 percent of cases. Clojure has -> and ->> for left and right chain. Julia could have @\> and @/>.

My initial instinct is that the commas are just annoying and don’t add much value. We’re in a special syntax context after all, might as well use it.

1 Like

I made Chain.jl with a specific focus on DataFrame manipulation, that’s why it defaults to first argument insertion. It’s not necessarily the theoretically best choice, it just seemed the most practical to me at the time.

4 Likes

As they say, there’s nothing more permanent than a temporary solution :stuck_out_tongue_closed_eyes:

Meanwhile, DataPipes.jl has made the analogous decision for FlexiGroup manipulation to insert into last position instead. :sweat_smile:

I agree, and Lazy.jl implements @> and @>> which mimic these Clojure macros.

The problem arises: what do you do when it changes mid-way through a chain? For example:

"HEAD a1 a2 a3"--(replace(_, r"HEAD\s*"=>""), match(r".*(\d+).*(\d+).*(\d+).*", _)).--(first, parse(Int,_), _^2)--join(_, ", ")
To see what this does

run demo code:

demo""" "HEAD a1 a2 a3"--(replace(_, r"HEAD\s*"=>""), match(r".*(\d+).*(\d+).*(\d+).*", _)).--(first, parse(Int,_), _^2)--join(_, ", ") """

Notice that match and replace are both string search operators, and parse and join both “convert” strings, yet receive the string in different argument positions :sweat_smile: This chain will be very inconvenient to attempt using Clojure’s thread-first and thread-last macros.

To solve this, two weeks ago I proposed two infix operators fix-first and fix-last, which allow you to flip mid-chain. That solves this problem and is natural enough, because they always replace the specified argument.

However, underscore PAS is capable of fixing any argument in any position. This comes with the benefit of not having to pre-determine which argument you will fix! The downside is, you’re not pre-determining which argument you will fix. The magical behavior of “choose any argument position! but if you don’t specify, I’m just gonna sneak it into this position for you” is a bit too sneaky for a generic language feature.

That said, the nice thing about generic language features is that people can justify the effort to build tooling for them. That means an autocomplete will be able to fill in the underscore for you (using type specialization and/or heuristics and/or statistical inference to determine which position you’re most likely to chain into).

DataPipes predate FlexiGroups by a lot (:
DataPipes puts the “previous” argument last because it’s typical for generic data manipulation functions both in Base Julia and across the ecosystem. The first argument(s) are often “lambda” functions, and the piped argument comes after them.
FlexiGroups just uses the same (official) convention, and is one of many packages doing so. It’s not like it is the first group(key, X) function out there (:

More on topic, your examples don’t really work for me. For example, the second “Base Julia” example errors with it not defined.
And also those example don’t look that much cleaner with this proposal compared to base julia. I think the bar for introducing new syntax into the language should be higher…

1 Like

Thanks for the history lesson :pray:

Oops! Fixed.

It’s not just about looking cleaner, sir.

My instinct is that even in special syntax contexts, we want to keep things feeling as “normal” as possible to minimize the shocking “oh, I’ve just jumped into a new language!” sensation. The commas are also aligned with their use in natural language, of delimiting clauses, and are easy to type anyway.

But now that I dwell on it, I’m growing more partial to the idea of still requiring them for inline expressions (i.e., :tuple), but not requiring them for block expressions (i.e., :block). We already have analogous behavior for Vectors.

I will spend some time digesting this.

Must admit that I did not follow all details of the new proposal. Yet, it is great to see the options from different packages collected in one place.
To me, the base Julia version do not seem much worse, i.e., Julia already has a lot of concise syntax. Furthermore, I just realized that it allows hanging lambdas a la Haskell:

df |>
  dropmissing |> x ->
  filter(:id => >(6), x) |> x ->
  groupby(x, :group) |> x ->
  combine(x, :age => sum)

"a=1 b=2 c=3" |>
  split |> it->
  map(it) do it
    split(it, "=") |> it ->
    Symbol(it[1]) => parse(Int, it[2])
  end |>
  NamedTuple

Thereby, the inserted argument can be moved a bit out of side, i.e., together with an editor shortcut for inserting |> it -> or |> ⬚ -> and maybe jumping to the next line this might be a workable solution already?

At least for simple chains, some higher order functions might also be quite nice:

∝(i, f) = (args...) -> it -> f(args[1:i-1]..., it, args[i:end]...)  # inserts it at the i-th position
⊚(f, g) = g ∘ f  # reverse composition

[1,2,3] |> (2∝filter)(isodd) ⊚ (2∝map)(sqrt)
df |>
  dropmissing ⊚
  (2∝filter)(:id => >(6)) ⊚
  (1∝groupby)(:group) ⊚
  (1∝combine)(:age => sum)

Kind of tacit programming with numbered arguments …

4 Likes

@uniment , I have been following your proposals with great interest over the past couple weeks. From a practical perspective in the area of my work, would you be willing to comment on my workflow with my tools? I really enjoyed seeing your comparisons to other tools but am struggling a bit to see how to apply your syntax in my workflow. Thanks!

1 Like

Wow, that’s neat. I did not know that.

2 Likes

I agree! I love it.

I disagree. But this depends on how frequently (and where) the feature is going to be used.

For multi-line chains, which will be fairly sparse in a given codebase, using base Julia for a call chain is ugly but not too bad. But for short, inline chains, the extra characters add enough effort and visual noise as to be prohibitive. For example:

my_func(x, my_arr--meth(_,1), z)
# versus
my_func(x, my_arr |> x->meth(x,1), y)

Or, let’s say you want to access a property after the chain:

x--foo--bar(_,y).a[1]
# versus
(x |> foo |> x->bar(x,y)).a[1]

With the current syntax, you have to go back to the beginning of the pipe chain to parenthesize it, which means you won’t really want to use pipes at all. It’s very discouraging.

Moreover, every time you define a lambda, it’s compiled from scratch (this includes defining a struct to hold the fixed arguments). This takes typically about ten milliseconds. It’s pretty wasteful to have a lot of these in your codebase, for functions that will only ever be used once.

Unless a compiler optimization is added for it, but that would be pretty sad just to defend a language feature so bad that it was on the verge of deprecation, but only survived because there wasn’t a better alternative (yes of course, |>).

So overall, for an idiom as common as call chaining, using lambdas here is so wasteful both visually and computationally that you won’t want to do it.

This is an interesting take on partial function application :wink: Using the Fix definition in the demo code,

function ∝(i, f)
    (a...;k...) -> Fix{((1:i-1)..., (i+1:length(a)+1)...), length(a)+1}(f, a[1:i-1]..., a[i:end]...; k...)
end

Underscore PAS is just too perfect for defining partial functions though…

I’m probably the worst person to ask for advice on workflow tooling :sweat_smile: but sure! drop me a DM

1 Like