Fixing the Piping/Chaining Issue

I think we should think of using Unicode from the start. It’s a non-issue that you can’t type it in, if you most likely want to press TAB and get the REPL (or IDE) to add something, then it can also replace part of what you typed.

Julia has a policy of Unicode optional, but it’s already kind of broken, with ⊻ operator only being available in non-ASCII form (while the the xor function is equivalent).

Possibly this arrow might work for frontfix: maybe this one (which happens to be the symbol for TAB, so a good reminder?) or maybe this one

And this one for backfix: or this one

I don’t really see a reason to have an ASCII alias, as operators (or at least not a reason I agree with), since the equivalents are in writing in the functional form, which is already in ASCII. It’s still useful to think about how you would type this in ASCII before you press TAB, and I propose > and >> (and require unspaced?). We can also discuss alternative Unicode proposals: Arrows (Unicode block) - Wikipedia

I’m not completely sold on the new operators, at least backfix, since this is new to me, and I would want to see good argument for it (well both) from real-world code.

What is the backslash is ingrained into me (but clearly not all [programmers]), but (front and) back (from that context) wasn’t obvious to me when first explained in relation to the operator(s).

Clojure is such a minority language that we do not have to emulate -> and ->> while they are now obvious to me, also the former taken.

f(a, b, c, d) ==
a⇥f(b, c, d) ==
[or, but not allow both...]
b⇴a⇴f(c, d)

EDIT: after posting this, the latter is clearly worse, at least very hard to see in that mono-spaced font... and also the nice arrow below, in edit mode...:

f(a, b, c, d) ==
d↷f(a, b, c) ==
c↷d↷(a, b)

Looks better for backfix:
f(a, b, c, d) ==
d↴f(a, b, c) ==
c↴d↴(a, b)

If it is purely a matter of the choice of symbol and Unicode is acceptable then that seems easily fixed.

Maybe (aka \lpargt) for FrontFix, since it is like > + ( meaning “curry variable just after the (” and (aka \gtrdot) for BackFix, since it is like > + ⋅ meaning “curry variable after all the other variables.”

From the OP:

That said, I much prefer using the unicode and instead of in and <= … so much so, that I recently forgot what <= means! :sweat_smile: So wherever unicode characters do a good job of elucidating what is meant, and they are easy to access (\in and \le are short to type and quite accessible), I prefer them.

However, I’m not a big fan of the proposed characters.

  • requires 12 keystrokes (\rig<TAB>ar<TAB>bar<TAB>)
  • requires 11 keystrokes (\circl<TAB>o<TAB>r<TAB>)
  • requires 8 keystrokes (\curv<TAB>r<TAB>)
  • I’m not finding a shortcut
  • requires 5 keystrokes (\lp<TAB><TAB>)
  • requires 7 keystrokes (\gtrd<TAB><TAB>)

If shorter tab-complete key combos are given, then that’s one problem solved. But then we still face the fact that we’re stealing operators that are potentially useful in obscure mathematical domains that I have no clue about, while choosing symbols whose appearances either don’t reflect the symmetry of FixFirst and FixLast well, or are too close to tell apart from reading distance. If people were complaining that /> and \> were too close to tell apart, some of these will test their limits. And for what?

I should note, even if we choose existing unicode operators, it will not allow immediate implementation. Even though arrow operators have the correct right-associativity, the operator precedences are incorrect (namely, it needs to bind more tightly than function calling). So an adjustment to the parser will be needed anyway.

I appreciate the effort though! Who knows, maybe we’ll eventually find suitable unicode characters, or maybe I can be convinced to like these ones.

This final concern I unfortunately do not have the mental toolkit to address. Perhaps someone like @stevengj, who is both smarter and wiser, can offer his insight.

Thinking about it now, I think I like this perspective.

When I think back on my childhood, learning about the syntax of function calls was indeed a confusing experience. To instead learn first about tuples, and then to think, “when writing f(a,b,c), there is an implicit operator being invoked which calls upon the function f and applies it to the tuple (a, b, c)” seems like quite a reasonable perspective.

The thing is, this is a very different perspective on the language… it’s more of a

https://discourse.julialang.org/t/psa-julia-is-not-at-that-stage-of-development-anymore/44872

Almost applies here. This fundamentally changes major components of the parsing and semantics of the language. I’m not sure how fundamental it is, but @StefanKarpinski might have something to say about this. Right now we have an Expr object that is a call which represents the function name and the arguments. You are asking to parse that thing as a function object and a tuple and then interpret the adjacency of a function and a tuple as “application” but “only if it’s not bound tighter by \> or />”

julia> foo = Meta.parse("f(x)")
:(f(x))

julia> typeof(foo)
Expr

julia> foo.head
:call

julia> foo.args
2-element Vector{Any}:
 :f
 :x

you are asking to fundamentally change the way things parse, so that “f(x,y,z)” will not be parsed as:

julia> Expr(:call,:foo,:x,:y,:z)
:(foo(x, y, z))

but rather something like:

Expr(:adjacency,Expr(:symbol :foo),Expr(:tuple, (:x,:y,:z)))

And then the fact that the two are adjacent implies a call to the function identified by the symbol under some circumstances and not in others…

I mean, this would be a HUGELY breaking change for many many macros for example. It’s a nonstarter for Julia 1 I’m pretty sure.

3 Likes

The following are easy enough to understand:

  • The definition of the FixFirst and FixLast structs.
  • The definition of \> and /> as functions.
  • The right associativity of \> and />.

The part that has been very poorly specified is the syntactic transformation that is applied to front/back fix expressions before the above definitions take effect. The Haskell example from @bertschi gives me an inkling of what might be going on, but when I try to apply it, I get nonsense results.

Let me try to analyze an expression by treating function application as the $ operator, as in @bertschi’s Haskell example. The example is this:

[4, 9, 16] \> map(sqrt) \> filter(iseven)

If we represent function application by $ and an argument list by {a, b, ...}, this is transformed into

[4, 9, 16] \> map $ {sqrt} \> filter $ {iseven}

Front fix binds tighter than $, so this becomes

([4, 9, 16] \> map) $ ({sqrt} \> filter) $ {iseven}

At this point we’re stuck, since we have no definition for what {sqrt} \> filter means. Even if it was just sqrt \> filter, it wouldn’t make sense to “pipe” sqrt into the last argument of filter. So either your specification is broken, or I don’t understand your specification because no one has clearly and correctly specified the initial syntax transformation that this proposal requires.

Why does this have to be so complicated? Why must I go through these mental gymnastics to understand a front/back fix expression? Why can’t we just have the following utterly simple, readable, and understandable code?

[4, 9, 16] |>
    map(sqrt, _) |>
    filter(iseven, _)
4 Likes

I think I see what you mean now. The function’s tuple of arguments must be somehow prevented from being abducted by the subsequent \> operation.

I will need to think about this. It definitely seems like something that cannot be boiled down into a standard operator precedence table, even if function application is an entry of the table. This disappoints me.

This I think is an interesting question. Why hasn’t #24990 been sufficiently championed to result in adding underscore placeholder syntax to the language proper? Is it just a matter of letting perfect be the enemy of good enough, as people try to get more than one function call out of it? Or is there something more?

1 Like

You’re right, the parsing would need to be different, i.e.,

[4, 9, 16] \> map(sqrt) \> filter(iseven)
[4, 9, 16] \> map $ {sqrt} \> filter $ {iseven}

are supposed to mean

((([4, 9, 16] \> map) $ (sqrt, )) \> filter) $ (iseven,)

which seems to require a non-trivial interaction between the piping and function call operators going beyond precedence …

Just some further comments:

  1. Having a nice and short syntax for partial application is certainly useful:

    filter(2 /> <, [1, 2, 3, 4])  # All values greater than 2
    filter(2 \> <, [1, 2, 3, 4])  # All values less than 2
    

    Obviously, the _ proposal would also work nicely here.

  2. Using Unicode symbols can be useful, but requires thought on the symbol to use. Ideally, it could be a kind of mnemonic for the desired operation. Yet, going overboard here easily looks like APL then:

    filter ← {(⍺⍺ ⍵)/⍵}
    (2∘<∧<∘5) filter 1 2 3 4 5 6  ⍝ All values between 2 and 5
    
1 Like

I will admit I neither understand (nor care that much) about the pieces of this proposal that seem to be most contentious, namely the binding more tightly than function call and maybe also right-associativity?

if obj |> foo(_, value) is acceptable then can /> be an alias for exactly that? and vice versa if obj |> foo(value, _) works then use \> for exactly that?

This also makes them more similar to |>, since they always require a callable one-argument function on the right-hand-side, just the /> happens to create that one-argument function via partial application.

My concern is that something medium would be a great solution, something huge was requested, and zero will be received. I just like how this proposal tackles both front/back threading and tab-complete in a pretty elegant way.

2 Likes

I think this is it. The discussion in that PR went around in circles because people kept on trying to come up with more complicated rules in order to get the code to “do what I mean, not what I say”. And, for better or worse, work on the Julia language is driven more by consensus than fiat. :slight_smile:

2 Likes

That doesn’t bother me at all. For now I’m thinking, do we want this, and then what are the best Unicode symbols? We can make a fast way to type this in, when we’ve agreed on some Unicode symbols.

In an OOP language you might do:

"First Last".<TAB>

and get offered (maybe more, that potentially the default at the top of the list):

"First Last".split()

in out case (even if you typed in a ., in case we’re ok with completing from that), when you confirm:

"First Last"⇥split()

but idiomatic Julia code is:

split("First Last")

If we make this possible and easy to do with TAB-completion, we’ll have a lot more of the former. I’m not even sure it’s a good thing, just more familiar with OOP people. I like TAB-completion, i.e. the discoverability, and actually think maybe we would want completion to the latter.

But let’s say we want the option to have the former. Then you’re only proposing a syntactic sugar, they would mean the same and even:

"First Last"↴split()

I’m thinking could we rather make FronFix (⇥ and BackFix ↴) mean something (slightly) different? I like that Julia is more powerful, i.e. has multiple dispatch by default. Does it make sense to also have single dispatch? I believe it can be faster when you don’t need multiple dispatch, but you still need dispatch.

Well it’s bad if it doesn’t work! And issues you bring up, but the alternative at github may not be better. It’s maybe more general, but in all cases these are redundant ways of writing the same thing, violating “Zen of Python”: There should be one-- and preferably only one --obvious way to do it.

We are passed that in Julia already on several fronts. There’s no hurry in deciding anything new. I’m not a parser guy either but I note you CAN add new syntax already at runtime, JuliaSyntax,jl is the new parser than will be the default later (first added as a non-default opt-in). I did propose new Julia 2.0 syntax for it, as a non-default, and the issue was closed for now, but still a positive response mentioning potentially different modules could opt into different syntax.

I don’t think the idiomatic part is the literal placement of the method name to the left of the arguments. The idiomatic part is in breaking the assumption that methods must be owned by a single one (usually the first) of their arguments.

1 Like

Indeed, [1, 2, 3, 4] |> filter(_>2, _) isn’t bad. To me it’s still a bit hard to read, but that could be a matter of practice and familiarity (considering how strongly people are championing for the @chain macro!).

I don’t know if I will ever like |> because it’s still awkward to type, but maybe that can change if I have more motivation to use it. Or maybe -- can be alternate syntax for the pipe operator, so we’d say [1, 2, 3, 4]--filter(_>2, _); this both looks cleaner, and is faster and easier to type. -- is currently not a valid operator so we wouldn’t be stealing it from any packages.

Attempting to describe the behavior of /> and \> in operator(ish) terms:

It seems like it should have asymmetric operator precedence on left- versus right-sides, depending on what type of operator it’s competing against for arguments.

Specifically, it should have higher precedence than the function application $, except when a function application is to the immediate left of it, in which case it should have lower or equal precedence (or whatever, using the word “precedence” doesn’t much matter; just let function application win in the tug-of-war for an argument).

For example, if function application has precedence of 20, then /> and \> could have precedence of 21—except when a function application is to the immediate left of it, in which case its precedence could be 0.

Yeah, this isn’t the sort of thing you can express in a standard operator precedence table. It’s not “clean” as such. And in comparison, underscore placeholder syntax is beginning to sound like it could be comparatively “cleaner.”

This is too bad, because if they had gone forward with it the simple version of consuming a single function call we might have better autocomplete by now!

Autocomplete with Placeholder Syntax

When I think about it, the knowledge that you have the ability to chain seems like sufficient information for a decent autocomplete. We all know that you’re most likely to chain the object into the first argument anyway. And if not, probably the last. So do we really need operators which force you to do what you were already going to do? The autocomplete shouldn’t act dumb, it can look for methods which place the object in front or back.

Consider this scenario:

You type [1, 2, 3, 4] |>. Autocomplete already knows it should be looking for methods that take a Vector{Int64} as their first argument or last argument. The methods that specialize most tightly to this type, of course, should float to the top of the list.

So things like split and filter appear. You want filter at the moment, so you type fi<TAB> and it fills out filter(, _), reflecting the number of arguments this method of filter has—the method which specializes on a::AbstractArray as its second argument—with the cursor before the comma to enter the about-to-be-fixed argument. You type _>2 and now have filter(_>2, _) with the cursor still before the comma. You press <TAB> and the cursor is brought to the right of the closing parenthesis. You continue the chain, life is good, and you proceed to make babies live happily ever after.

It could be possible to have functions which take the object as a first argument appear below the cursor, and functions which take it as a last argument appear above. Or some other variation which allows tabbing through which argument it’s going to fill into.

A more sophisticated approach could be to build a script which scans all the GitHub repos and builds a statistical model of how frequently the various methods are used on objects of various types, and then use this data to statistically infer which methods you are likely to invoke. Maybe if you’ve included certain packages, it can infer based on the frequencies with which others who’ve loaded similar packages use certain methods. One could also have a model which factors in your unique style, as you might use some functions more than other people do; start with the GitHub repos as a prior, and conjugate it with personal use data observations to form a posterior (Dirichlet distribution maybe?).

Considering that the point of writing code is to do what hasn’t been done, you wouldn’t want the inference model to overfit to the data, so you’d want to play around with the optimal autocomplete strategy to find the sweet spot between auto-recommending the common versus the uncommon, but that’s a whole new conversation.

The point is, a good autocomplete really just needs two pieces of information: 1) what object type you’re hoping to call a function on, and 2) the fact that you can chain effectively. The rest of it is a matter of inference, which can be heuristic, statistical, or whatnot. Maybe even have competing autocomplete engines, who knows. Getting a respectable chaining syntax into the language is essentially a prerequisite to a respectable autocomplete, and underscore placeholder syntax could fit the bill.

Typed Partially Applied Functors

Recall that part of the motivation in the OP was to generalize Base.Fix1 and Base.Fix2. I think my proposal for FixFirst and FixLast does the trick.

But part of the proposal was for my syntax sugar to allow creation of these objects. I wonder if underscore syntax could do the same?

It seems obvious that this could create a FixFirst(f, x) object:

f(x, _...; _...)

and this a FixLast(f, x) object:

f(_..., x; _...)

but I wonder if these should:

f(x, _) # creates FixFirst(f, x) ?
f(_, x) # creates FixLast(f, x) ?

We run into the issue that while my fix operators fix only one argument (and thus are quite natural for FixFirst and FixLast), underscore placeholder syntax wants to fix every argument except for…, which makes it harder to make typed functors to describe its operation.

I guess the question arises… why do we need typed functors anyway? Who cares that Base.Fix1 and Base.Fix2 are types, instead of just anonymous functions? Some people seem to care, but should they really?

Because if they don’t, placeholder underscore syntax should satisfy them as-is.

And if they do… maybe some sort of Base.Fix{F, NTuple{N,Union{Int, Symbol}}, Tuple{N,DataType}} where {F,N} type could be constructed by underscore placeholder syntax to describe the partially applied function and the fixed argument indices and types…

1 Like

Indeed, [1, 2, 3, 4] |> filter(_>2, _) isn’t bad.

Oh, does @chain figure out that filter(_ > 2, _) means x -> filter(l -> l > 2, x)?

Could you rewrite that example using \> and />?

Regarding the unicode, I don’t think it’s fair to call \in<TAB> easier to type than e.g. \lp<TAB><TAB> since one or two tabs consecutively is practically zero overhead. It does matter (a lot) if there are characters between the two tabs though.

1 Like

A [very] rough first-pass at what a Base.Fix functor could look like for underscore partial application syntax:

julia> struct Fix{F, I, V}
           f::F # function
           i::I # arg indices
           v::V # arg values
       end

julia> (fixer::Fix)(args...; kwargs...) = begin
           j, i = firstindex(args), firstindex(fixer.i)
           arglist = Vector(undef, length(args)+length(fixer.i))
           for k ∈ eachindex(arglist)
               if k ∈ fixer.i
                   arglist[k] = fixer.v[i]
                   i += 1
               else
                   arglist[k] = args[j]
                   j += 1
               end
           end
           fixer.f(arglist...; kwargs...)
       end

julia> f = Fix(map, (1,), (x->x^2,))
Fix{typeof(map), Tuple{Int64}, Tuple{var"#23#24"}}(map, (1,), (var"#23#24"(),))

julia> f([1,2,3])
3-element Vector{Int64}:
 1
 4
 9

julia> g = Fix(map, (2,), ([1, 2, 3],))
Fix{typeof(map), Tuple{Int64}, Tuple{Vector{Int64}}}(map, (2,), ([1, 2, 3],))

julia> g(x->x^2)
3-element Vector{Int64}:
 1
 4
 9

julia> h = Fix(map, (1,2), (x->x^2, [1, 2, 3]))
Fix{typeof(map), Tuple{Int64, Int64}, Tuple{var"#27#28", Vector{Int64}}}(map, (1, 2), (var"#27#28"(), [1, 2, 3]))

julia> h()
3-element Vector{Int64}:
 1
 4
 9

Probably very low performance and buggy at this moment, allocating into a type-unstable array at runtime. Also, the struct maybe should somehow store exactly how many arguments there are (currently it just splats args… into the end, and allows keyword arguments when no ; _... is provided). Also, the data structure would need to store whether there is an argument slurp, and somehow handle the cases where the slurper _... is not at the end of the argument list, but in the middle or front.

Feels doable.

1 Like

No, @chain does not operate like that. This is how #24990, with the simple rule of “consume one call”, would work.

IMO, “consume one call” is the only viable option for underscore placeholder partial application syntax.

It would be [1, 2, 3, 4] \> filter(2 \> >). Which is confusing as heck, so you’d probably write instead [1, 2, 3, 4] \> filter(x->x>2).

Maybe a structure like this is more appropriate:

struct Fix{F, FI, BI, KK, FV, BV, KV}
    f::F # function
    fis::FI # `Int` tuple, front arg indices
    fvs::FV # `Any` tuple, front arg values
    bis::BI # `Int` tuple, back arg indices (negative values, counting from end)
    bvs::BV # `Any` tuple, back arg values
    kks::KK # `Symbol` tuple, keyword arg keys
    kvs::KV # `Any` tuple, keyword arg values
end

And calling the fixer::Fix functor can simply assume that any additional args... will occur in the middle between the front args and back args; any additional ; kwargs... will simply be allowed; and the check to confirm the correct number of arguments has been passed will occur after all the arguments have been collected, when the functor finally calls fixer.f.

Maybe a structure like this:

struct Fix{F, I, V}
    f::F # function
    v::V # tuple, argument values
end
# where I is a tuple of argument indices: 
#     positive numbers for front arguments' positions (w.r.t. start of ordered arguments)
#     negative numbers for back arguments' positions (w.r.t. end of ordered arguments)
#     symbols for keyword argument keys

And calling the fixer::Fix functor can simply assume that any additional args... will occur in the middle between the front args and back args; any additional ; kwargs... will simply be allowed; and the check to confirm the correct number of arguments has been passed will occur after all the arguments have been collected, when the functor finally calls fixer.f.

If I have it right, the type for FixFirst would then be Fix{F, Tuple{1}} where F (one could easily define a const to equal this if needed), and the type for FixLast would be Fix{F, Tuple{-1}} where F.

Hey @uniment ,

Just a small aside: I have been following this thread with some interest as a huge fan of the piping/chaining syntax Julia has. Reading through the comments here and there has been very educational and elucidating about aspects of Julia I haven’t either thought about or known existed. Ultimately I do not know where this proposal will end up but I appreciate the discussion you spawned (and the rather high quality of your proposal!) and the generally amicable tone throughout thinking about a hard problem.

Great discussion!

~ tcp :deciduous_tree:

16 Likes

Its a fair point but the |> is insufficient for my use cases, in data engineering or anlysis work. IMO this is a useful extension of the pipe idiom. About 80% of my feeling here is because ‘filter’ places the data object last.