Fixing the Piping/Chaining Issue

You’re right, the parsing would need to be different, i.e.,

[4, 9, 16] \> map(sqrt) \> filter(iseven)
[4, 9, 16] \> map $ {sqrt} \> filter $ {iseven}

are supposed to mean

((([4, 9, 16] \> map) $ (sqrt, )) \> filter) $ (iseven,)

which seems to require a non-trivial interaction between the piping and function call operators going beyond precedence …

Just some further comments:

  1. Having a nice and short syntax for partial application is certainly useful:

    filter(2 /> <, [1, 2, 3, 4])  # All values greater than 2
    filter(2 \> <, [1, 2, 3, 4])  # All values less than 2
    

    Obviously, the _ proposal would also work nicely here.

  2. Using Unicode symbols can be useful, but requires thought on the symbol to use. Ideally, it could be a kind of mnemonic for the desired operation. Yet, going overboard here easily looks like APL then:

    filter ← {(⍺⍺ ⍵)/⍵}
    (2∘<∧<∘5) filter 1 2 3 4 5 6  ⍝ All values between 2 and 5
    
1 Like

I will admit I neither understand (nor care that much) about the pieces of this proposal that seem to be most contentious, namely the binding more tightly than function call and maybe also right-associativity?

if obj |> foo(_, value) is acceptable then can /> be an alias for exactly that? and vice versa if obj |> foo(value, _) works then use \> for exactly that?

This also makes them more similar to |>, since they always require a callable one-argument function on the right-hand-side, just the /> happens to create that one-argument function via partial application.

My concern is that something medium would be a great solution, something huge was requested, and zero will be received. I just like how this proposal tackles both front/back threading and tab-complete in a pretty elegant way.

2 Likes

I think this is it. The discussion in that PR went around in circles because people kept on trying to come up with more complicated rules in order to get the code to “do what I mean, not what I say”. And, for better or worse, work on the Julia language is driven more by consensus than fiat. :slight_smile:

2 Likes

That doesn’t bother me at all. For now I’m thinking, do we want this, and then what are the best Unicode symbols? We can make a fast way to type this in, when we’ve agreed on some Unicode symbols.

In an OOP language you might do:

"First Last".<TAB>

and get offered (maybe more, that potentially the default at the top of the list):

"First Last".split()

in out case (even if you typed in a ., in case we’re ok with completing from that), when you confirm:

"First Last"⇥split()

but idiomatic Julia code is:

split("First Last")

If we make this possible and easy to do with TAB-completion, we’ll have a lot more of the former. I’m not even sure it’s a good thing, just more familiar with OOP people. I like TAB-completion, i.e. the discoverability, and actually think maybe we would want completion to the latter.

But let’s say we want the option to have the former. Then you’re only proposing a syntactic sugar, they would mean the same and even:

"First Last"↴split()

I’m thinking could we rather make FronFix (⇥ and BackFix ↴) mean something (slightly) different? I like that Julia is more powerful, i.e. has multiple dispatch by default. Does it make sense to also have single dispatch? I believe it can be faster when you don’t need multiple dispatch, but you still need dispatch.

Well it’s bad if it doesn’t work! And issues you bring up, but the alternative at github may not be better. It’s maybe more general, but in all cases these are redundant ways of writing the same thing, violating “Zen of Python”: There should be one-- and preferably only one --obvious way to do it.

We are passed that in Julia already on several fronts. There’s no hurry in deciding anything new. I’m not a parser guy either but I note you CAN add new syntax already at runtime, JuliaSyntax,jl is the new parser than will be the default later (first added as a non-default opt-in). I did propose new Julia 2.0 syntax for it, as a non-default, and the issue was closed for now, but still a positive response mentioning potentially different modules could opt into different syntax.

I don’t think the idiomatic part is the literal placement of the method name to the left of the arguments. The idiomatic part is in breaking the assumption that methods must be owned by a single one (usually the first) of their arguments.

1 Like

Indeed, [1, 2, 3, 4] |> filter(_>2, _) isn’t bad. To me it’s still a bit hard to read, but that could be a matter of practice and familiarity (considering how strongly people are championing for the @chain macro!).

I don’t know if I will ever like |> because it’s still awkward to type, but maybe that can change if I have more motivation to use it. Or maybe -- can be alternate syntax for the pipe operator, so we’d say [1, 2, 3, 4]--filter(_>2, _); this both looks cleaner, and is faster and easier to type. -- is currently not a valid operator so we wouldn’t be stealing it from any packages.

Attempting to describe the behavior of /> and \> in operator(ish) terms:

It seems like it should have asymmetric operator precedence on left- versus right-sides, depending on what type of operator it’s competing against for arguments.

Specifically, it should have higher precedence than the function application $, except when a function application is to the immediate left of it, in which case it should have lower or equal precedence (or whatever, using the word “precedence” doesn’t much matter; just let function application win in the tug-of-war for an argument).

For example, if function application has precedence of 20, then /> and \> could have precedence of 21—except when a function application is to the immediate left of it, in which case its precedence could be 0.

Yeah, this isn’t the sort of thing you can express in a standard operator precedence table. It’s not “clean” as such. And in comparison, underscore placeholder syntax is beginning to sound like it could be comparatively “cleaner.”

This is too bad, because if they had gone forward with it the simple version of consuming a single function call we might have better autocomplete by now!

Autocomplete with Placeholder Syntax

When I think about it, the knowledge that you have the ability to chain seems like sufficient information for a decent autocomplete. We all know that you’re most likely to chain the object into the first argument anyway. And if not, probably the last. So do we really need operators which force you to do what you were already going to do? The autocomplete shouldn’t act dumb, it can look for methods which place the object in front or back.

Consider this scenario:

You type [1, 2, 3, 4] |>. Autocomplete already knows it should be looking for methods that take a Vector{Int64} as their first argument or last argument. The methods that specialize most tightly to this type, of course, should float to the top of the list.

So things like split and filter appear. You want filter at the moment, so you type fi<TAB> and it fills out filter(, _), reflecting the number of arguments this method of filter has—the method which specializes on a::AbstractArray as its second argument—with the cursor before the comma to enter the about-to-be-fixed argument. You type _>2 and now have filter(_>2, _) with the cursor still before the comma. You press <TAB> and the cursor is brought to the right of the closing parenthesis. You continue the chain, life is good, and you proceed to make babies live happily ever after.

It could be possible to have functions which take the object as a first argument appear below the cursor, and functions which take it as a last argument appear above. Or some other variation which allows tabbing through which argument it’s going to fill into.

A more sophisticated approach could be to build a script which scans all the GitHub repos and builds a statistical model of how frequently the various methods are used on objects of various types, and then use this data to statistically infer which methods you are likely to invoke. Maybe if you’ve included certain packages, it can infer based on the frequencies with which others who’ve loaded similar packages use certain methods. One could also have a model which factors in your unique style, as you might use some functions more than other people do; start with the GitHub repos as a prior, and conjugate it with personal use data observations to form a posterior (Dirichlet distribution maybe?).

Considering that the point of writing code is to do what hasn’t been done, you wouldn’t want the inference model to overfit to the data, so you’d want to play around with the optimal autocomplete strategy to find the sweet spot between auto-recommending the common versus the uncommon, but that’s a whole new conversation.

The point is, a good autocomplete really just needs two pieces of information: 1) what object type you’re hoping to call a function on, and 2) the fact that you can chain effectively. The rest of it is a matter of inference, which can be heuristic, statistical, or whatnot. Maybe even have competing autocomplete engines, who knows. Getting a respectable chaining syntax into the language is essentially a prerequisite to a respectable autocomplete, and underscore placeholder syntax could fit the bill.

Typed Partially Applied Functors

Recall that part of the motivation in the OP was to generalize Base.Fix1 and Base.Fix2. I think my proposal for FixFirst and FixLast does the trick.

But part of the proposal was for my syntax sugar to allow creation of these objects. I wonder if underscore syntax could do the same?

It seems obvious that this could create a FixFirst(f, x) object:

f(x, _...; _...)

and this a FixLast(f, x) object:

f(_..., x; _...)

but I wonder if these should:

f(x, _) # creates FixFirst(f, x) ?
f(_, x) # creates FixLast(f, x) ?

We run into the issue that while my fix operators fix only one argument (and thus are quite natural for FixFirst and FixLast), underscore placeholder syntax wants to fix every argument except for…, which makes it harder to make typed functors to describe its operation.

I guess the question arises… why do we need typed functors anyway? Who cares that Base.Fix1 and Base.Fix2 are types, instead of just anonymous functions? Some people seem to care, but should they really?

Because if they don’t, placeholder underscore syntax should satisfy them as-is.

And if they do… maybe some sort of Base.Fix{F, NTuple{N,Union{Int, Symbol}}, Tuple{N,DataType}} where {F,N} type could be constructed by underscore placeholder syntax to describe the partially applied function and the fixed argument indices and types…

1 Like

Indeed, [1, 2, 3, 4] |> filter(_>2, _) isn’t bad.

Oh, does @chain figure out that filter(_ > 2, _) means x -> filter(l -> l > 2, x)?

Could you rewrite that example using \> and />?

Regarding the unicode, I don’t think it’s fair to call \in<TAB> easier to type than e.g. \lp<TAB><TAB> since one or two tabs consecutively is practically zero overhead. It does matter (a lot) if there are characters between the two tabs though.

1 Like

A [very] rough first-pass at what a Base.Fix functor could look like for underscore partial application syntax:

julia> struct Fix{F, I, V}
           f::F # function
           i::I # arg indices
           v::V # arg values
       end

julia> (fixer::Fix)(args...; kwargs...) = begin
           j, i = firstindex(args), firstindex(fixer.i)
           arglist = Vector(undef, length(args)+length(fixer.i))
           for k ∈ eachindex(arglist)
               if k ∈ fixer.i
                   arglist[k] = fixer.v[i]
                   i += 1
               else
                   arglist[k] = args[j]
                   j += 1
               end
           end
           fixer.f(arglist...; kwargs...)
       end

julia> f = Fix(map, (1,), (x->x^2,))
Fix{typeof(map), Tuple{Int64}, Tuple{var"#23#24"}}(map, (1,), (var"#23#24"(),))

julia> f([1,2,3])
3-element Vector{Int64}:
 1
 4
 9

julia> g = Fix(map, (2,), ([1, 2, 3],))
Fix{typeof(map), Tuple{Int64}, Tuple{Vector{Int64}}}(map, (2,), ([1, 2, 3],))

julia> g(x->x^2)
3-element Vector{Int64}:
 1
 4
 9

julia> h = Fix(map, (1,2), (x->x^2, [1, 2, 3]))
Fix{typeof(map), Tuple{Int64, Int64}, Tuple{var"#27#28", Vector{Int64}}}(map, (1, 2), (var"#27#28"(), [1, 2, 3]))

julia> h()
3-element Vector{Int64}:
 1
 4
 9

Probably very low performance and buggy at this moment, allocating into a type-unstable array at runtime. Also, the struct maybe should somehow store exactly how many arguments there are (currently it just splats args… into the end, and allows keyword arguments when no ; _... is provided). Also, the data structure would need to store whether there is an argument slurp, and somehow handle the cases where the slurper _... is not at the end of the argument list, but in the middle or front.

Feels doable.

1 Like

No, @chain does not operate like that. This is how #24990, with the simple rule of “consume one call”, would work.

IMO, “consume one call” is the only viable option for underscore placeholder partial application syntax.

It would be [1, 2, 3, 4] \> filter(2 \> >). Which is confusing as heck, so you’d probably write instead [1, 2, 3, 4] \> filter(x->x>2).

Maybe a structure like this is more appropriate:

struct Fix{F, FI, BI, KK, FV, BV, KV}
    f::F # function
    fis::FI # `Int` tuple, front arg indices
    fvs::FV # `Any` tuple, front arg values
    bis::BI # `Int` tuple, back arg indices (negative values, counting from end)
    bvs::BV # `Any` tuple, back arg values
    kks::KK # `Symbol` tuple, keyword arg keys
    kvs::KV # `Any` tuple, keyword arg values
end

And calling the fixer::Fix functor can simply assume that any additional args... will occur in the middle between the front args and back args; any additional ; kwargs... will simply be allowed; and the check to confirm the correct number of arguments has been passed will occur after all the arguments have been collected, when the functor finally calls fixer.f.

Maybe a structure like this:

struct Fix{F, I, V}
    f::F # function
    v::V # tuple, argument values
end
# where I is a tuple of argument indices: 
#     positive numbers for front arguments' positions (w.r.t. start of ordered arguments)
#     negative numbers for back arguments' positions (w.r.t. end of ordered arguments)
#     symbols for keyword argument keys

And calling the fixer::Fix functor can simply assume that any additional args... will occur in the middle between the front args and back args; any additional ; kwargs... will simply be allowed; and the check to confirm the correct number of arguments has been passed will occur after all the arguments have been collected, when the functor finally calls fixer.f.

If I have it right, the type for FixFirst would then be Fix{F, Tuple{1}} where F (one could easily define a const to equal this if needed), and the type for FixLast would be Fix{F, Tuple{-1}} where F.

Hey @uniment ,

Just a small aside: I have been following this thread with some interest as a huge fan of the piping/chaining syntax Julia has. Reading through the comments here and there has been very educational and elucidating about aspects of Julia I haven’t either thought about or known existed. Ultimately I do not know where this proposal will end up but I appreciate the discussion you spawned (and the rather high quality of your proposal!) and the generally amicable tone throughout thinking about a hard problem.

Great discussion!

~ tcp :deciduous_tree:

16 Likes

Its a fair point but the |> is insufficient for my use cases, in data engineering or anlysis work. IMO this is a useful extension of the pipe idiom. About 80% of my feeling here is because ‘filter’ places the data object last.

Would you still feel that |> is insufficient if underscore placeholder syntax was part of the language?

Namely, if you could type this:

my_data |> filter(filtering_func, _)

and it would execute this:

filter(filtering_func, my_data)

would that satisfy your needs?

1 Like

Presently, I have a typed functor that could be appropriate for placeholder partial application:

struct Fix{F,NA,NKW,I,V,KW}
    f::F                          # function
    nargs::NA                     # number of args; `nothing` for unlimited (varargs)
    nkwargs::NKW                  # number of kwargs; `nothing` for unlimited (varkwargs)
    fixargs::V                    # fixed arg values
                                  # ::I is tuple of fixed arg indices
    fixkwargs::KW                 # fixed kwargs (keys & values)
end

Fix(f, nargs::Union{Int,Nothing}, nkwargs::Union{Int,Nothing}, fixindices::Tuple, fixargs::Tuple; fixkwargs...) = begin
    length(fixindices) > 0 && @assert all(fixindices[begin:end-1] .< fixindices[begin+1:end]) "all indices in increasing order please"
    length(fixkwargs) > 0  && @assert all(keys(fixkwargs)[begin:end-1] .< keys(fixkwargs)[begin+1:end]) "all keyword arguments in alphabetical order please"
    Fix{typeof(f), typeof(nargs), typeof(nkwargs), fixindices, typeof(fixargs), typeof(fixkwargs)}(f, nargs, nkwargs, fixargs, fixkwargs)
end
Fix(f, nargs::Union{Int,Nothing}, nkwargs::Union{Int,Nothing}, fixindices::Tuple, fixargs::Tuple, fixkwargs) = 
    Fix(f, nargs, nkwargs, fixindices, fixargs; fixkwargs...)
Fix(f, fixindices::Tuple, fixargs::Tuple; fixkwargs...) = 
    Fix(f, nothing, nothing, fixindices, fixargs, fixkwargs)
Fix(f, fixindices::Tuple, fixargs::Tuple, fixkwargs) = 
    Fix(f, fixindices, fixargs; fixkwargs...)

(fix::Fix{F,NA,NKW,I,V,KW})(args...; kwargs...) where {F,NA,NKW,I,V,KW} = begin
    arglen = length(args)+length(I)
    argitr = ((1:arglen)...,)

    fixarg_map = (I, fix.fixargs) # tuple of fixed arg indices and tuple of fixed arg values
    argsI = filter(i-> i ∉ I && i-arglen-1 ∉ I, argitr) # arguments not fixed must be called
    arg_map = (argsI, args) # tuple of called arg indices and tuple of called arg values

    argsout = map(argitr) do i
        which_map = (i ∈ I || i-arglen-1 ∈ I) ? fixarg_map : arg_map
        which_map[2][findfirst(i-arglen-1 ∈ I ? ==(i-arglen-1) : ==(i), which_map[1])]
    end
    isnothing(fix.nargs) || @assert length(argsout) == fix.nargs
    kwargs = (; kwargs..., fix.fixkwargs...)
    isnothing(fix.nkwargs) || @assert length(kwargs) == fix.nkwargs
    fix.f(argsout...; kwargs...)
end

To construct and call this object is easy:

f = (a, b, c, d) -> (a, b, c, d)
g = Fix(f, (1,), (:a,)) # funcname, fixed arg indices, fixed arg values
g(:b, :c, :d)
h = Fix(f, (1,2), (:a,:b))
h(:c, :d)
i = Fix(f, (1,2,3), (:a,:b,:c))
i(:d)
j = Fix(f, (1,2,3,4), (:a,:b,:c,:d))
j()

Negative indices denote distance from the end of the argument list (-1 for the end, -2 for next-to-end, etc.)

k = Fix(f, (-1,1), (:d,:a))
k(:b, :c)

Keyword arguments are allowed, and additional arguments can be inserted to specify that the called function has a fixed number of arguments (instead of varargs).

Problem

At the moment it seems to work fine, except: when fixing zero or one arguments it is type stable, but with two or more arguments fixed the functor’s call to map loses type stability. I’m struggling to figure out why.

Example:

julia> @btime Fix((a,b,c,d)->(a,b,c,d), (1,), (1,))(2, 3, 4)
  1.000 ns (0 allocations: 0 bytes)
(1, 2, 3, 4)

julia> @btime Fix((a,b,c,d)->(a,b,c,d), (1,2), (1,2))(3, 4)
  778.505 ns (17 allocations: 640 bytes)
(1, 2, 3, 4)
1 Like

I would guess one reason is that arglen is an Int you are passing into a closure and that pushes it down to a runtime value.

And instead of using argitr at all you should do

argsout = ntuple(arglen) do i
    ...
end

You just have to make sure you don’t drop compile time constants down to runtime values anywhere.

1 Like

After spending some time looking at all this, to me, this would be very useful. But that would be under the expectation that this placeholder syntax is general, meaning that it’s actually syntactic sugar for, let’s call it smart lambdas (SL), that works anywhere. isodd = _ % 2 == 1 for example.

I would then find that your OP suggestion would be very useful in addition to SL, if we make it do partial application (or eval, if conditions are met), which also works anywhere. With these two together, we would have both conciseness and clarity, and I believe that would be apparent in real world examples.

Here’s how that would look in regards to our earlier filtering shenanigans:

# today, assuming we shun magic one-parameter functions
[] |> list -> filter(n -> n > 2, list)

# smart lambdas
[] |> filter(_ > 2, _)

# OP (sane version)
[] \> filter(n -> n > 2)

# SL + OP
[] \> filter(_ > 2)

In my opinion, the last entry is both concise and clear in meaning; noticeably clearer than just the SL one as well.

The idea with partial application would also be very useful in general, since the following (shunning magic functions) becomes much clearer IMO:

# today, prepping a list to have several functions applied
prepped_list = fn -> foreach(fn, some_list)

# SL vs OP
prepped_list = foreach(_, some_list)
prepped_list = some_list \> foreach

# today, prepping a function to be applied to several lists
apply_fn = list -> foreach(some_fn, list)

# SL vs OP
apply_fn = foreach(some_fn, _)
apply_fn = some_fn /> foreach

(Note how OP references only the things actually used.)

No idea if it’s viable in Julia or not, or if it’s worth the trouble, but it would probably make a lot of the functional style stuff less cumbersome to code and parse mentally.

1 Like

It might make sense to use a generated function for applying a FixArgs functor. Here’s my naive implementation. It doesn’t have all the bells and whistles, but I compare the performance with and without a generated function.

struct FixArgsNotGenerated{F, I, V}
    f::F
    vals::V
end

function FixArgsNotGenerated{I}(f, vals) where I
    FixArgsNotGenerated{typeof(f), I, typeof(vals)}(f, vals)
end

function (f::FixArgsNotGenerated{F, I, V})(args...) where {F, I, V}
    front_args = []
    fixed_args_i = 0
    unfixed_args_i = 0

    last_I = last(I)

    for full_i in 1:last_I
        if full_i in I
            fixed_args_i += 1
            push!(front_args, f.vals[fixed_args_i])
        else
            unfixed_args_i += 1
            push!(front_args, args[unfixed_args_i])
        end
    end

    f.f(front_args..., args[unfixed_args_i+1:end]...)
end

struct FixArgsGenerated{F, I, V}
    f::F
    vals::V
end

function FixArgsGenerated{I}(f, vals) where I
    FixArgsGenerated{typeof(f), I, typeof(vals)}(f, vals)
end

@generated function (f::FixArgsGenerated{F, I, V})(args...) where {F, I, V}
    front_args = []
    fixed_args_i = 0
    unfixed_args_i = 0

    last_I = last(I)

    for full_i in 1:last_I
        if full_i in I
            fixed_args_i += 1
            push!(front_args, :(f.vals[$fixed_args_i]))
        else
            unfixed_args_i += 1
            push!(front_args, :(args[$unfixed_args_i]))
        end
    end

    :(f.f($(front_args...), args[$(unfixed_args_i + 1) : end]...))
end

Benchmark code:

using BenchmarkTools

foo(a, b, c, d) = (a, b, c, d)
f = FixArgsNotGenerated{(2, 4)}(foo, (:b, :d))
g = FixArgsGenerated{(2, 4)}(foo, (:b, :d))

Benchmark result:

julia> @btime f(:a, :c);
  403.475 ns (6 allocations: 288 bytes)

julia> @btime g(:a, :c);
  22.708 ns (1 allocation: 48 bytes)

For prior art on FixArgs, take a look at this package:

https://goretkin.github.io/FixArgs.jl/dev/

2 Likes

Although perhaps what Raf was getting at is that if you keep everything in terms of tuple operations on type values, then the compiler can automatically unroll the loop, so you don’t even need a generated function.

1 Like

Yeah you can do this with reduce. We also need to use $f and $g to time this.

function (f::FixArgsNotGenerated{F,I,V})(args...) where {F,I,V}
    init = ((args, f.vals), ())
    inds = ntuple(identity, max(I...))
    _, combined_args = reduce(inds; init) do ((a, v), out), i
        if i in I
            ((a, Base.tail(v)), (out..., first(v)))
        else
            ((Base.tail(a), v), (out..., first(a)))
        end
    end
    f.f(combined_args...)
end

Although for some reason not quite as good as the generated function:

julia> @btime $f(:a, :c)
  5.844 ns (0 allocations: 0 bytes)
(:a, :b, :c, :d)

julia> @btime $g(:a, :c)
  3.264 ns (0 allocations: 0 bytes)
(:a, :b, :c, :d)

Edit: but these are same-typed inputs. I think the generated function is the way to go here.

2 Likes

Thanks for helping @Raf and @CameronBieganek! I’m not quite happy with it until it is perfectly transparent (so that @btime $g(:a, :c) runs in exactly the same time as @btime $foo(:a, :b, :c, :d) but this is a great start. I certainly need more practice writing efficient functional code and @generated functions.

Although I agree, I am concerned that there is sufficient overlap in what can be done with underscore partial application as with frontfix/backfix partial application, that the extra overhead of a) implementing both and b) learning both may not be worthwhile. And as what can be done with underscore partial application (assuming we get _... slurping) is a superset of what can be done with frontfix/backfix, and it seems cleaner implementation-wise, then it seems like the horse to bet on. (akshually … underscore partial app. is a superset with the exception of not being able to apply until there are zero arguments left. Maybe there is a way to correct this…)

I’m not a fan of the extra comma and underscore in the call to filter, but if the autocomplete will fill it in then I might not mind. (Also I set using Windows Powertoys a hotkey to enter an underscore.) And for readability, it’s likely better to have the placeholder there than to also have to learn (and remember) two additional operators in addition to learning how placeholders work.