Partial Application brackets without underscores

After years I understand the idea behind the proposal of PR 24990. It wants to bring partial application (or currying) to Julia which is much overdue.

I thought previously, _ is merely a crutch to be eventually used for iteration over multiple data values or pointwise operation through data with the same function. With the new Forum post by user uniment , I understand that Scala’s idea behind f(_, x, _) actually is currying where _ is a placeholder for a non-applied argument. _ turns the call into a mere assignment of arguments to parameters and suppresses the call.

problems with _

As your discussion on Github pointed out, this idea by itself has several problems:

  • for growing number of arguments, _ makes the syntax “explode” linearly in the number of function arguments
  • It looks like you need to know the number of arguments for the call which prevents generic programming (usage in generic function libraries)
  • the entire problem about the scope of _ (beyond the first pair of parentheses or not)

the alternative

There is a second (I think better) syntax (with various different ideas) to achieve the same without the requirement of denoting missing argument places:

a specific function “call” operator which assigns arguments but suppresses the call. Unfortunately, Julia missed the opportunity to generalize function call syntax. Arrays, Hashtables, Functions are abstractly the same thing, just different in terms of mutability. In consequencce, the brackets are not available anymore.

Before suggesting a solution below, I’d like to show what could have been the ideal if currying and partial application would have been a thought from the start:

  • If the brackets could be used, one could have written argument assignment as f[x][param=y] or f[x, param=y] which looks surprisingly similar to substitution syntax in mathematics: f[param1\x, param2\y]. Then you could write f[x][y][z]() which is literally equivalent to f[x,y,z]().

  • Shifting positional arguments would have been possible by using parameter indices for unnamed parameters f[ 3 => x, y ] or f[ _3 = x, y ] instead of f(_, _, x, y).

Next option would have been the member access operator in conjunction with function call syntax f.(x, y) which is not available either.

concept

But what about

f.[ x ].[ y ] = f.[ x, y ] ?

It looks like being invalid syntax in the REPL, free for a meaning, right now. There is also an intuitive mnemonic: [ … ] defines a mapping and accessing it with . applies this map to the variable in keyspace (parameter or member space, the codomain) – merely replacing the symbol by a map, in order to map new arguments to old arguments. In correspondence to [ … ] – which modifies the variable in value space (variable content, return value) – .[ … ] modifies the function or object interface as if it would be a hashtable or an array and may even allow mutation of the return value for specific patterns.

[spoiler]Maybe you would prefer f.{ x }. If the operator is not supposed to be assignable, .{ … } is closer to immutability. Whereas f.[ … ] looks like it could be assigned. It also looks like it allows for arbitrary expressions inside the brackets. There could also be a reason to reserve .[ … ] for other things: for example expressions evaluated in the context of an object. In this example, the expression may refer to variables in member space and use lenses so that point.[ .x, .y ] = v, w is equivalent to point.x = v, point.y = w, point.[ .x + .y ] being point.x + point.y, point.[ .x, .y = v, w ] would be a mathematical substitution like a reference assignment, not a value assignment, or point.[ ::Int, ::Float64 ] = n, x (assigning n to the first Int member, x to the first Float64 member). End of excourse.

Whatever version people prefer (if any at all), I will just use .[…] for demonstration.[/spoiler]

underscore revisited

For specific positional currying, it could still have an optional non-applied argument placeholder _ and/or dictionary like keys. In that case, the _ is scoped by the brackets .[ … ] automatically, problem solved. For everything else, dictionary key expressions could be used (which is preferable in most cases).

Possible notation(s)

  • f.[ _, y, _, x ] which is _1, _2, params... -> f(_1, y, _2, x, params...)

  • f.[ 4 => x, 2 => y; … ] # => maps an index expression, parameter or member symbol (expression) to a value

    • instead -> is known to map an argument pattern to a new value

    [spoiler]The relation is, if an input pattern should be prevented, rewriting notation like mathematical substitution) could be allowed, such as f.[ (7 -> 1 , 3 -> 0) ] which maps an input to another input, i.e. that a call with argument = 7 – f(7) – is dynamically deferred to f(1), f(3) would be deferred to f(0). This is mainly useful for fallback arguments, for example to handle out of bouns accesses or illegal argument types via f.[ _ -> 0 ] where _ is only matched as pattern if the function or object did not find an accessed member or the current parameter did not match the argument. This is perfect for on-the-fly-interpolation views.

    Instead, the notation f.[ 7 ] = 1 could remap argument value 7 to return value 1. Supporting this would enable pattern matching definitions as known from functional languages.[/spoiler]

  • f.[ _4 = x, _2 = y ; … ]

    • another syntax option using positional parameter names _index ; these names could also be private and inaccessible from outside
    • if disambiguation is necessary, people can use ._index instead which unambiguous
    • the positional parameter name _ refers to the next parameter name that hasn’t been accessed yet within the current scope of brackets (automatic index)

      f.[ _ = x, _ = y ] is equivalent to f.[x, y]. Positional parameter names, which are unassigned, will not be applied but still may be used. f.[ _1 , _1 ] therefore would be x -> f(x, x) whereas f.[ _ , _1 ] would rather correspond to f.[ _2 , _1 ] .

semicolon

The last two notations could allow for further arguments after the semicolon. At first it seems inconsistent with the current convention which puts parameterized names after the semicolon but contrary to that, it is related to the parenthesized expression notation as found in x -> (y = 3; x + y).

limits

Anything more complex – which requires internal parameter names or nested anonymous functions – should use the existing arrow syntax which exists exactly for the reason to define paramter names and for more complex scoping.
foo_square_sum = (_foo, _array) -> sum( x -> _foo(x)^2 , _array )

definitions by examples

Without semicolon, the following arguments (without explicit argument position) do automatically increase in position
f.[ 3 => x , w , y, z ] could mean _1, _2, params... -> f( _1, _2, x,w,y,z, params...)

While by using the semicolon, the following expression is reset to the smallest yet unassigned index.

f.[ 3 => x ; w , y, z ] would be equivalent to f.[ 3 => x ][w, y, z] i.e. f(_, _, x, _)( w, y, z, &_ ) in terms of notation mentioned in the initially-linked post.

Multiple expressions could be possible, each value being assigned to parameter(s):
f.[ 3 => x ; 2 => y, z ; w ] is equivalent to f.[ _, _, x].[ _, y , z ].[ w ] which would lower to params... -> f(w, y, x, z, params...).

The expression before the semicolon mainly is useful to define variables (or members) that can be used after the semicolon. . prefixed names inside the expression are treated as lenses (i.e. member accesses inside the current context). The dot is used to disambiguate symbols or external variables from member names and also alludes to the member access.

  • f.[ .param = sqrt(x); .param, .param ] is equivalent to params... -> (temp = sqrt(x); f( temp, temp, params...; param=temp)
  • f.[ foo = _ + 3; foo, foo ] could turn foo into a scoped local variable: (_1, params...) -> ( foo = _1 + 3; f.[foo, foo, params...] )
    • assigned variables are ignored for argument application unless they start with .. Reason: flexibility
  • point.[ .z = 1 ] extends a 2D point to homogenous coordinates but point.z might be immutable this way

conclusion

At the end, _, ._index, .member, => and -> is compatible enough to allow for arbitrary combination. I expect skepticism, also because of number of new ideas involved which maybe don’t fit into how seasoned Julian’s do programming. But any subset of these could be chosen in theory in combination with .[ … ]. It would even be an improvement to have just .[ … ] with no further feature on top.

Implementing these abstraction would not only itself be quite some work but particularly making these abstractions performant enough for the high demands of special users. In my opinion, usability should still prevail performance by far, so performance should not play too much of a role in the discussion except for when the impact is dramatic. Optimization and readability/maintainability is a trait off.

2 Likes

Warning: approaching this topic is like kicking a hornet’s nest, and as I’ve learned the hard way, you can’t kick a hornet’s nest and expect not to get tagged a few times. So please take any critiques in stride!

I read this proposal, and maybe it’s the sleep deprivation from having a new baby, or maybe I’m just an idiot (nevermind, that has already been established), but I felt like I didn’t understand any of it. I’ll give it another read when I get home in an hour.

I want to offer a rebuttal to the issues you brought up before I approach the remainder of the proposal. I need to get this off my chest first before I can think clearly about the rest.

I anticipate three primary use cases for the syntax:

  1. In use for “chaining,” where every argument is fixed except for one.
  2. When constructing a simple function for, e.g., a map or filter call, e.g.
    filter(_^2>5, seq).
  3. When fixing “important” arguments and leaving the rest unfixed, and therefore most likely fixing the first and/or last arguments.

In the first case, there will usually only be one _ insertion and you’ll be typing out the rest anyway. In the second case, there usually can only be one _ insertion. In the third case, a splatting _... can be used.

Using my demo code for example:

julia> f(args...)=args
f (generic function with 1 method)

julia> demo"(1:10...,)--f(_, :a, _..., :b, _)"
(1, :a, 2, 3, 4, 5, 6, 7, 8, 9, :b, 10)

julia> g=demo"f(_, :a, _..., :b, _)"
f(_, :a, _..., :b, _, )

julia> demo"(1:10...,)--g(_...)"
(1, :a, 2, 3, 4, 5, 6, 7, 8, 9, :b, 10)

The only cases where _'s will become unbearably multitudinous, are when you wish to fix an argument in the middle of a very long argument chain—but are those really partial functions you’d want?

(and if, despite your better judgment, you really did, you could avoid the syntax sugar altogether and just call the Fix constructor directly):

julia> Fix{(5,6),10}(f, :x, :y)
f(_, _, _, _, :x, :y, _, _, _, _, )

See above. But, for example, I defined FixFirst as:

const FixFirst = Fix{F,(1,),0,Tuple{X}} where {F,X}

And a FixFirst object can be created by:

julia> FixFirst(f, :x)
f(:x, _...)

julia> g = demo"f(:x, _...)"
f(:x, _...)

julia> g isa FixFirst
true

julia> g(:y, :z)
(:x, :y, :z)

This is not a problem when _ is restricted to only partial function application; it is only a problem when the scope of its operation is (attempted to be) expanded beyond this, to “do more”—an effort which has thus far proven sisyphean.

The problems which motivated a desire to make the syntax “do more,” I believe to be solvable with function composition. And even in a hypothetical world where they aren’t, it’s my view that _ should be limited to partial function application anyway, as it addresses a large enough and common enough class of problems to warrant syntax sugar for it.

That’s really neat, that x.[i] and x.{i} are unclaimed syntax! (and they parse!)

HeheheLaughGIF

If you want to play around with your ideas a bit, using my demo code for the Fix partial applicator, you can do this:

Function Currying with []

julia> f(a...) = a
f (generic function with 1 method)

julia> Base.getindex(f::Union{Function, Fix}, i) = FixFirst(f, i)

julia> f[:a][:b][:c]
f(:a, _..., )(:b, _..., )(:c, _..., )

julia> f[:a][:b][:c](1, 2, 3)
(:a, :b, :c, 1, 2, 3)

Setting Arbitrary Indices with Pairs

julia> f(a...) = a
f (generic function with 1 method)

julia> Base.getindex(f::Function, i::Pair{Int,<:Any}...) = 
           Fix{((x[1] for x ∈ i)...,), 0}(f, (x[2] for x ∈ i)...)

julia> f[2=>:a, 4=>:b, 6=>:c]
f(_, :a, _, :b, _, :c, _..., )

julia> f[2=>:a, 4=>:b, 6=>:c](1, 2, 3)
(1, :a, 2, :b, 3, :c)

Unfortunately, the generation of an applicator using Pairs is type-unstable. You can use Tuples instead though!

Do note, that the two methods are incompatible (i.e., you’d want to pick one or the other; to have both would cause inconsistent behavior).

Overall, given my rebuttal above, I don’t see this proposal as solving any problems in my proposal. Further, as currying is a subset of partial application, having syntax sugar only for partial application is probably good enough. Fixing arguments by position number, rather than visual position, I also don’t see as valuable enough to devote specialized syntax (and you can already do it fairly conveniently using my Fix constructor).

But who knows! I could be wrong. Maybe play around with it, and see if you can find examples where this is preferable?

Thank you for your reception.

The existing array notation is kind of dangerous because it has already a meaning. Not recommended.

There is an advantage in key-value pairs because the key can also be a runtime value. Maybe it’s possible with parameters in braces as well, I don’t know. (This would go into dependent type theory which seems like something Julia developers would like to avoid, given the dynamic type system they are using as a replacement.)

The concern is usability. I think, underscores go into a wrong direction, both for maintenance and readability. In my view, it’s an oldfashioned workaround idea. I thought the whole point about underscore anonymous functions is iteration over data. The dot notation is better there.

And why using huge intractible pipes with lots of anonymous functions when you just could use temporary variables for intermediate results which also serve as good code comments. I find these underscore not to be a big improvement over arrow synax. Only for many arguments but in those cases, f.[ 3 => x ] is still better than adding maybe 6 other underscores into the parameter tuple.

Goal: Keeping notation concise and minimize the unexpected. Just having func.[ x, y ] without anything else would be enough to improve the situation. If someone needs underscores, you could add them on top or just use arrows but I think, “partial application” (rather a “partial assignment” than evaluation) is the more important feature if Julia wants to go towards real functional programming.

f.[ x, y ] is less restricted than f( x , y , &_ ) and f( x , y , _... ). The parenthesis versions use the old call operator and modify its meaning due to a specific argument. It tries to insert a weird symbol for suppressing the call. This looks like a C++ idea, introducing a notation to undo (or reduce) the actual meaning of existing notation for specific cases which previously had a consistent meaning.

Main criticism: f.[ x , y ] is harder to optimize at compile time. It’s likely to require dynamic dispatch because you can only dispatch when the function is called with () or ( … ). It does not need to impact evaluation with a fixed number of fixed bracket expressions .[ … ] but as soon as you get a dynamic number or varying bracket expressions, it might need dynamic dispatch.

But except for the before-known number of arguments, the situation is not bettter for the underscores.

Of course! I’m only suggesting it for a simple demo, to play around with and see whether the user experience is worthwhile. It’s also straightforward to write a macro which will take .[] notation.

As the Fix functor in my demo shows, you can achieve type stability with runtime values by using type parameterization (the same technique is used for NamedTuples). Thankfully, JIT compilation makes magic happen. The concern about type instability applies to my demo using Pairs, but if you write a macro properly then you won’t have type instability.

I don’t love the character _, and neither does @bertschi (who has made clear a preference for , which I sympathize with). However, it’s nice that it’s an ASCII character (i.e., more accessible), and using it as an rvalue currently forces an error (i.e., it’ll be non-breaking to use it), so I think those benefits outweigh the drawbacks. Combine that with the fact that Scala already uses them for exactly this purpose, so there is precedent. I had originally been opposed because I didn’t like the appearance, but I’m now in favor of it.

I don’t think I understand what you mean by this.

If you ever use en Deutsche the pronoun “es,” you will understand that many times during a sequence of function calls on an object you do not care to give the intermediate result a name. Sometimes you will, of course, but then you will give it a name! I hope you do not force me to say, when I encounter an orange,

I pick up the orange. Call the result a picked_up_orange. I peel the picked_up_orange. Call the result a peeled_orange. I split the peeled_orange. Call the result a split_orange. I eat the split_orange. Call the result an eaten_orange.

It’s still an orange (or a simple transformation thereof), so why should I have to keep giving it new names? I would much rather just call it “it” and get on with my life, and doing so does not make the problem of eating an orange intractable.

I pick up the orange. I peel it. I split it. I eat it. It was yummy.

Much better.

Arrow syntax for defining lambdas is very nice. However, there are many benefits to having a partial applicator with a proper type, such as method dispatch and type inference.

As for the underscore partial application syntax—the benefit is greatest in two cases:

  1. Method chaining, to specify where the chained object will be inserted.
  2. Short function definitions (e.g. to feed to map or filter), such as √_ ≥ 5..

The second case in particular will be adversely affected by your proposal.

How often do you care to fix only the third argument and none other? Is this a realistic use case? Or more precisely, is it common enough to make this trade-off?

I sympathize with this sentiment. This is also partly why I had originally been opposed to _ syntax. However, being able to write an expression like

filter(_%3==0, x)

does seem to make it worthwhile.

I would bring up the do statement again, but I’m tired of that line of argumentation.

In either case, whether you choose f(x, _, z) or f.[x, _, z], new syntax must be learned. The behavior is unambiguous anyway, so I’m not sure I see a strong impetus to jump for syntax which will force _^2+1 to be written as ^.[_, 2]+1. The downsides seem to outweigh the benefits.

Admittedly, f(x, y, &_) to specify a fully-applied but not evaluated function is quite ugly. This seems to be the strongest argument in favor of f.[x, y]. However, I consider this an edge case and I don’t anticipate it will be a common occurrence to make these.