Starting this thread anew to avoid confusion, because I changed my mind on which approach I support partway through a long thread (sorry!). It’s also useful to distill some of the last thread’s learnings.
Thank you to all who participated—especially those who rightfully criticized my ignorance! Many issues are matters of genetic algorithm or simulated annealing, and ossified entrenched problems aren’t solved without some heat.
Please grab a cup of coffee, pull up a chair, and bring a fresh pair of eyes to consider this (refreshed!) proposal. Or for the impatient, skip to the demo at the end and start playing with it.
Objective
To solve Julia’s piping/chaining/currying/partial application problem with an elegant and functional approach worthy and idiomatic of Base Julia (i.e., not constrained to macro calls).
Background
See the previous proposal for background information.
To summarize why I changed my mind:
Proposal
I am making essentially two proposals: 1.) partial function application syntax, and 2.) call chaining syntax.
PAS
I now support PR #24990 with some minor modifications. #24990 is Scala-style “tight-currying” underscore syntax. I will refer to this as “Partial Application Syntax,” although my informal name for it is “basically *the* perfect syntax for partial application.” Quoting @stevengj from the PR, with my modifications in italics:
-
The currying is “tight”, i.e. it only converts the immediately surrounding function call into a
lambdapartial application functor, as suggested by @JeffBezanson (and as in Scala). So, e.g.f(g(_,y))
is equivalent tobehaves likef(x -> g(x,y))
. (Note that something likefind(!(_ in c), y)
will work fine, because the!
operator works on functions; you can also use_ ∉ c
.) Any other rule seems hard to make comprehensible and consistent. -
Similar to Scala, multiple underscores are converted into multiple arguments in the order they appear. e.g.
f(_,y,_)
is equivalent tobehaves like(x,z) -> f(x,y,z)
. -
A slurping underscore is converted into varargs, for example
f(a,_...,z)
behaves like
(args...)->f(a,args...,z)
. (note that only one slurp is allowed per expression) -
Use
.
to denote broadcasting, e.g.f.(x,_,z)
is equivalent tobroadcast(f,x,_,z)
, and_.^2
is equivalent tobroadcast(^, _, 2)
. -
For the special case of denoting an expression that will become a zero-argument function due to fixing all arguments, use
&_
. For example,g=f(x,y,&_); g()
Note: I have no preference for the specific character sequence; I choose it because it parses. -
The object returned by the underscore syntax is a parametrically-typed
Fix
partial application functor which allows for type inference, dispatch, and pretty-printing. For example,f(x,_,z)
is equivalent toFix{(1,3),3}(f, x, z)
. This resolves #36181. See “Demo” for sample code.
I also add a proposal specific to chaining:
CCS
A function chaining syntax, which I will call “Call Chain Syntax,” which gives preferred treatment to PAS (the syntax, not the functor object) by suppressing applicator construction using syntax transforms for better compile time and by splatting automatically, and which also allows for “chain glue expressions” using a local keyword it
(this will become more clear later).
-
The infix character sequence
--
, with the same “operator precedence” as property getting.
dot (17 as of writing), which takes on its R.H.S. either a callable object, or an expression ofit
, or a:block
of callable objects inclusive-or expressions ofit
, and calls them sequentially. For example:
x--f
is equiv tolet it=x; it=f(it) end
,
x--f(_,y)
is equiv tolet it=x; it=f(it,y) end
,
y--f.(x,_)
is equiv tolet it=y; it=f.(x,it) end
,
x--f(_^2,_).a[2]
is equiv to(let it=x; it=f(_^2,it) end).a[2]
,
x--(f; it^2+it; it/5)
is equiv to
let it=x; it=f(it); it=it^2+it; it=it/5 end
, and
x--begin f; g; h end
is equiv tox--f--g--h
,
which is equiv tox--(f; g; h)
, which is equiv to
let it=x; it=f(it); it=g(it); it=h(it) end
.
Note: I choose--
because it looks neat, is easy to type, and is currently invalid syntax so it won’t be piracy to claim it.
Note: Expressions are considered to be expressions ofit
if they have it anywhere, *unless* it exists only locally within nested call chains. -
(This should satisfy Chain.jl users:) Within the scope of a R.H.S. expression, a keyword
it
is defined locally, representing the result of executing the previous element in the chain. Expressions ofit
capture everything that has tighter binding than the block expression delimiter. For example:
x--(f; it.a+it.b; g)
is equiv to
let it=x; it=f(it); it=it.a+it.b; it=g(it) end
.
Note: I prefer a keyword likeit
to avoid confusion with the partial function placeholder_
, as they have distinct meanings. Usingit
leverages the fact that the English language already uses the pronoun “it” for the exact same purpose of method chaining. -
If the next expression is a PAS expression with multiple
_
underscores, then the previous expression will be splatted in.
x--(f; g(_,y,_...))
is equiv to
let it=x; it=f(it); it=g(it[1],y,it[2:end]...) end
. -
To broadcast on a callable object or an expression of
it
, or a:block
of such, use.--
. For example,
x.--(f; it.a[1])
evaluates like
let it=x; it=f.(it); it=(it->it.a[1]).(it) end
. -
A callable type
Base.ChainLink
, representing a sequence of operations. -
If
--
or.--
is “headless”, i.e., prefix, having no object on its L.H.S., then it behaves as a constructor for aBase.ChainLink
with the aforementioned behaviors (notably w.r.t. splatting and theit
keyword). For example, if we let
foo = --(f; it.a+it.b; g)
, this gives aBase.ChainLink
object that can be called like
it -> begin it=f(it); it=it.a+it.b; it=g(it) end
. Then,
x--foo
is equiv tofoo(x)
, which evaluates like the example from #8.
Similarly, a broadcasting chainlink can be defined by:
baz = .--(f; it.a+it.b; g)
, which calls like
it -> begin it=f.(it); it=(it->it.a+it.b).(it); it=g.(it) end
.
BecauseBase.ChainLink
is a callable object, it can also be inserted into other chainlinks. For example:
bar = --(h; foo); y--bar
. -
For Chain.jl users: an
@aside
macro is not required. For example:.
x--(f; (println(it); it); g)
yieldsg(f(x))
and printsf(x)
. -
(Stretch Goal) Special handling for the
do
statement, when taking--
as an “argument name.” Example:
map(x) do -- f; it + it^2; g end
would be equiv to
map(--(f; it + it^2; g); x)
[Some] Details
Fix Functor
Latest edition of code for a generalized Fix
partial application functor is in “Demo.”
This Fix
functor type generalizes the behavior of Base.Fix1
and Base.Fix2
to fix any number of arguments in any position, for functions with fixed or variable argument lengths, and keyword arguments. With this functor, the behavior of Base.Fix1
and Base.Fix2
as currently defined for 2-argument functions is captured by the parametric types:
const Fix1{F,X} = Fix{F,(1,),2,Tuple{X},NamedTuple{(),Tuple{}}}
const Fix2{F,Y} = Fix{F,(2,),2,Tuple{Y},NamedTuple{(),Tuple{}}}
It’s also easy to make functors accepting variable argument lengths:
const FixFirst{F,X} = Fix{F,(1,),0,Tuple{X}}
const FixLast{F,X} = Fix{F,(-1,),0,Tuple{X}}
as well as any other argument position and count.
To call such an object without syntax sugar:
[1,2,3] |> Fix1(map, Fix2(^,2))
[1,2,3] |> FixLast(join, ", ")
[1,2,3] |> Fix{(1,2),3}(mapreduce, Fix2(^, 2), +)
Using underscore and chaining syntax:
[1,2,3] -- map(_^2, _)
[1,2,3] -- join(_..., ", ")
[1,2,3] -- mapreduce(_^2, +, _)
Modified Underscore syntax
In alignment with @jeff.bezanson and @stevengj, I think the “tight currying” approach inspired by Scala, where _
is used strictly for partial function evaluation (and not for constructing arbitrary lambdas), is the only acceptable approach for implementing underscore syntax as a language feature—that is to say, it’s sufficiently generic, straightforward to reason about, and useful to justify incorporation into the language proper.
Furthermore, considering how frequently partial application is employed, it’s sensible to add syntax sugar for it (as we already have for lambdas).
The modifications I propose to the original syntax of #24990 are simple: _...
for slurping varargs, &_
for the special case of creating a zero-argument functor, and the expected broadcasting behavior using .
. With these additions, underscore syntax becomes fully general, able to create partial (and full) applicators fixing any number of arguments in any position for any function that the language permits.
Call Chain Syntax: --
The situations where it isn’t needed to compile the functor are those occasions when it would be defined and immediately called and discarded, i.e., when using it as part of a call chain. The proposed call chain syntax --
would invoke a syntax transformation such that the constructor and functor of the Fix
partial function applicator needn’t be compiled, enhancing the utility of underscore syntax.
In addition, the chaining syntax allows for tuples of arguments to be “splatted” into whichever argument positions are desired, as directed by underscore syntax.
Call chain syntax has been specified to operate with the same precedence as .
property getting. This is so that, for example, a chain like this:
"1"--parse(Int,_) == 1
is evaluated as
parse(Int,"1") == 1
instead of
(parse(Int,_) == 1)("1")
Importantly, it’s so that the result of a chain can have its properties and indices accessed without having to parenthesize the entire call chain.
A sequence of operations that are devoid of the object they will operate on I call a Base.ChainLink
, and are constructed with “headless” --
call chain syntax. Of course, a new chainlink can be created by composing other chainlinks. Ultimately, a full chain requires the source object too.
Although there’s heavy overlap between the functionality of the pipe operator |>
and call chain syntax --
, I suspect the latter will find greater use, as the former is notoriously inconvenient to type (among other reasons).
Examples
Starting a Spark Session
SparkSession.builder--appName(_, "Main")--master(_, "local")--getOrCreate
Chaining
# notice that `(_)` and `(it)` cause no harm
[1, 2, 3] -- begin
filter(isodd, _)
map(_^2, _)
sum(it)
sqrt(_)
end ==
3.1622776601683795
More
"1 2, 3; hehe4"--eachmatch(r"(\d+)", _).--(first; parse(Int, _))--join(_, ", ") ==
"1, 2, 3, 4"
Replacing Base.Fix1
and Base.Fix2
Base.values(x::MyStruct{K1,<:Any,K2,<:Any}) where {K1,K2} =
(values(x.a)..., map(getfield(x.b, _), filter(_∉K1, K2))...)
Function Composition
chain = --(split(_, r",\s*"); .--(parse(Int, it)^2); join(_, ", "))
"1, 2, 3, 4"--chain ==
"1, 4, 9, 16"
Fluent Interface
Kitten()--(setname(_,"Salem"), setcolor(_,:black), save)
Transducer Chainlink
process_bags = --begin
mapcatting(unbundle_pallet)
filtering(is_nonfood)
mapping(label_heavy)
end
process_bags--into(airplane, _, pallets)
Splatting Multiple Arguments
(_^2, [1,2,3])--mapreduce(_, +, _) ==
14
Rearranging before splatting
(:a,:b)--(reverse; f(_,_))
Fully-Applied “Partially Applied” Function
run = println("Hello", "world!", &_)
run()
Example from Chain.jl Readme
df--begin
dropmissing
filter(:id => >(6), _)
groupby(_, :group)
combine(_, :age => sum)
end
Example from DataPipes.jl Readme
"a=1 b=2 c=3"--begin
split
.--(split(_, "="); Symbol(it[1]) => parse(Int, it[2]))
NamedTuple
end
New Pushback & Responses
In your examples you still have quite a lot of
_
underscores. How is this better than xyz.jl chaining package?
Yes there are a handful more characters here and there, but it adds legibility and obviousness. It also makes use of existing idioms, which I think is a good thing.
Recall an important constraint: to settle on generic syntax for the language proper, not for a domain-specific package.
When things are a proper part of the language, then people can start to justify making tooling and tab-autocompletes for them. Hence, I don’t anticipate the extra _
's will be a nuisance in the near future.
This is sugar. Sugar belongs in macros.
Why? Pretty much everything in a language beyond parentheses is sugar; even infix operators are sugar.
Sugar improves productivity, and it encourages and promotes preferred idioms. Partial application is a good idiom to prefer in a functional language. And method chaining is a preferred idiom, well, everywhere (including in spoken languages and in mathematics).
Partial application and function call chaining are two separate concepts, and should not be in the same proposal together.
This is probably correct. I don’t know exactly when or why the two concepts got conflated, but they did—at least for me. Maybe they are just concepts that are easily conflatable?
Why do you mention autocomplete? Your proposal doesn’t solve any of the problems of autocomplete.
The IDE doesn’t know what type of variable you’re operating with, so it’s impossible to find what methods will operate on it.
This would be a reasonable statement, if we decreed that nobody could run Julia in interactive mode.
And someday, somebody will implement an autocomplete that works outside interactive mode too. But we should crawl before we walk.
Can you give an example of how you imagine autocomplete might work?
Of course! Here’s a walked-through step-by-step example.
Can you provide sample code of how this autocomplete might work?
Yup! I made some sample code here.
Autocomplete is rendered useless by Julia’s method genericism; showing a method list sorted by type specialization is a bad idea, because so many methods are generic to
Any
type.
I disagree, but I think this too is a solvable problem.
Some more thoughts on how to do it.
It’s impossible to reach consensus on such a contentious topic; we will never implement it into the language.
This is a strangely defeatist attitude for a language whose inspiration is to “have it all.”
In my estimation, we need to settle on a syntax that’s generic enough, simple enough, and powerful enough, and remind ourselves that a good language feature will do something simple and natural, and do it very well, so that it can compose well with other language features. I’m hoping this proposal satisfied those objectives.
Demo
Of course, this wouldn’t be complete without a demo!
This is a hacky demo, but it should do the trick.
NOTE: I cannot update to my latest code, because I hit the character limit. See comment #32 for latest code and demos.
Note: The Fix
functors are pretty solid, but macro demo_str
is hacky like hacky sack.
- It currently doesn’t work with broadcasting chainlinks (i.e., headless
.--
) - Broadcasting on functions sometimes works and sometimes doesn’t
- The operator precedence of
--
in this demo is similar to multiplication (12), but it’s intended that it should have precedence similar to the.
dot operator (17).
Code:
*** DELETED; SEE COMMENT 32 ***
Fun examples to try
NOTE: I cannot update to my latest code, because I hit the character limit. See comment #32 for latest code and demos.
*** DELETED; SEE COMMENT 32 ***
Play with it, compare with the dozen or so chaining packages out there, and let me know your thoughts!