ANN: Underscores.jl: Placeholder syntax for closures

c42f · March 25, 2020, 10:47am

I’m happy to announce Underscores.jl, a small but general package for _ placeholder syntax which makes it easier to pass anonymous functions to other functions.

The package provides a single @_ macro which is helpful in writing anonymous functions succinctly without naming the arguments. Some examples:

Expression	Meaning
`@_ map(_+1, a)`	`map(x->x+1, a)`
`@_ map(_+2^_, a)`	`map((x)->x+2^x, a)` (NOTE)
`@_ map(_2/_1, a, b)`	`map((x,y)->y/x, a, b)`
`@_ func(a,__,b)`	`x->func(a,x,b)`
`@_ data	> map(_.f,__)`

(Edit to add (NOTE): this changed from the original meaning of (x,y)->x+2^y as discussed below.)

A lot of packages have done similar things before, but Underscores takes a slightly different approach: the @_ is attached to the function receiving the closure rather than delimiting the start of the closure itself.

It turns out this outer placement is really convenient because

It reduces the need for extra parentheses: @_ map(_+1, a) rather than map(@_(_+1), a)
It is fairly conceptually consistent to allow it to act on a whole data pipeline, which makes it rather handy for tabular data manipulation.

This package was stimulated by the excellent discussion over at https://github.com/JuliaLang/julia/pull/24990, and elsewhere.

As a more concrete example, this provides natural syntax sugar for the functional approach to data manipulation taken in SplitApplyCombine.jl (@andyferris you may find this fun):

julia> using TypedTables, Underscores, SplitApplyCombine

julia> t = Table(name = ["Alice", "Bob", "Charlie", "Eve"],
                 age = [25, 42, 37, 45],
                 sex = ["female", "male", "male", "female"])
Table with 3 columns and 4 rows:
     name     age  sex
   ┌─────────────────────
 1 │ Alice    25   female
 2 │ Bob      42   male
 3 │ Charlie  37   male
 4 │ Eve      45   female

julia> @_ t |> 
          filter(_.age > 27, __) |>
          group(_.sex, __) |>
          map(length, __)
2-element Dictionaries.HashDictionary{String,Int64}
   "male" │ 2
 "female" │ 1

oschulz · March 25, 2020, 11:23am

Just curious, how does Underscores.jl compare with LambdaFn.jl (https://github.com/haberdashPI/LambdaFn.jl)?

oschulz · March 25, 2020, 11:24am

Is Underscores.jl compatible with FastClosures.jl? That would be cool!

c42f · March 25, 2020, 12:08pm

The packages attack the same problem, but differ in the placement of the macro. As I noted above, the @_ is attached to the function receiving the closure rather than delimiting the start of the closure itself. (Though note that the double placeholder __ acts the same as the LambdaFn single placeholder.) See also A way to allow for `filter`, `map` piping... · Issue #1 · c42f/Underscores.jl · GitHub)

It’s not compatible with FastClosures (at least, not yet); I’m not sure how that could be made to work in a natural and composable way. I guess one option would be to declare that every closure made by Underscores.jl is a fast closure. For the typical use of Underscores.jl that might actually be ok but it does departs from the normal Julia scoping rules. Another option might be to generalize the types of expressions that FastClosures deals with somehow… maybe add a special rule for underscores

oschulz · March 25, 2020, 2:09pm

Thanks for the explanation!

davidanthoff · March 25, 2020, 4:14pm

I like almost all of it, except that a second use of _ turns things into a two argument function. That makes it hard to combine with the core Query/SplitApplyCombine story. Here are some super common things in those packages that don’t work with this pattern: df |> @filter(_.colA > 2 && _.colB<8) |> @map((a=log(_.colA), b=_.colB * 2)). I would just remove that, especially given that there is already a nice syntax for multiple arg functions with _1 etc.

JeffreySarnoff · March 25, 2020, 6:25pm

This package is big step forward in resolving an approach to _ imo.
I agree that the _1 _2 syntax is nice, and with that there is no requirement for another way to express “something and another thing.” As I recall, typically the use of _ in several places within an expression indicates the repetition of something rather than the introduction of another thing unless, quite clearly, it does not (binaryfn(_,_)). So I concur with the view that more than one _ be treated as copies of the same thing (e.g. _1s).

pablosanjose · March 25, 2020, 6:58pm

I like almost all of it, except that a second use of _ turns things into a two argument function. That makes it hard to combine with the core Query/SplitApplyCombine story. Here are some super common things in those packages that don’t work with this pattern: df |> @filter(_.colA > 2 && _.colB<8) |> @map((a=log(_.colA), b=_.colB * 2)) . I would just remove that, especially given that there is already a nice syntax for multiple arg functions with _1 etc.

I agree with this

Mason · March 25, 2020, 8:02pm

The (potential) problem is that unlike _, _1 is actually a valid identifier so using _1, _2 for a 2-argument function is more likely to cause weird issues. Doesn’t matter too much in a macro though, but it’s worth keeping in mind as a potential pain point.

c42f · March 26, 2020, 1:00am

This is an interesting point. The current behavior came from copying RFC: curry underscore arguments to create anonymous functions by stevengj · Pull Request #24990 · JuliaLang/julia · GitHub but that didn’t have named placeholder syntax as an alternative for multiple arguments.

Anyway, I agree that having _1 and _ mean the same thing is a lot better for the tabular data scenario and I’m inclined to make this change.

But what about more general cases? I’d greatly appreciate if people can try this out for real code and non-tabular data and report back with their examples and impressions.

tkf · March 26, 2020, 2:26am

One case current multi-arg version could be useful was currying merge

dicts = [Dict(zip(rand('a':'e', 3), rand(0:99, 3))) for _ in 1:10]
@_ reduce(merge(+, _, _), dicts)

However, we’ll have mergewith in Julia 1.5 so it can be done with reduce(mergewith(+), dicts).

Edit: Similar example:

vecs = [randn(3) for _ in 1:10]
@_ reduce(_ .+ _, vecs)

tkf · March 26, 2020, 2:58am

@_ is fun! This is another usecase:

extrema_(xs) = @_ mapreduce(
    (min = _1, max = _1),
    (min = min(_1.min, _2.min), max = max(_1.max, _2.max)),
    xs,
)

which is apparently faster than Base.extrema

julia> xs = randn(1000);

julia> @btime extrema($xs)
  5.673 μs (0 allocations: 0 bytes)
(-3.270990011151266, 3.254351333498243)

julia> @btime extrema_($xs)
  3.258 μs (0 allocations: 0 bytes)
(min = -3.270990011151266, max = 3.254351333498243)

c42f · March 26, 2020, 3:52am

I don’t think this is an issue for a macro-based solution: people have to opt-in to using the macro so it can’t break any existing code. If there was a viable alternative we could consider it, but I think _1 and _2 etc is the most obvious and succinct syntax for this.

It occurs to me that a cute addition might be to allow superscripts or subscripts as synonyms for the pure-ascii version:

@_ map(_2/_1, a, b)
@_ map(_₂/_₁, a, b)
@_ map(_²/_¹, a, b)

I like the subscript version to look at, though I probably wouldn’t bother using it myself. More cute than useful, perhaps?

kristoffer.carlsson · March 26, 2020, 9:22am

 _extrema(x) = minimum(x), maximum(x)

is faster than Base.extrema as well.

mkborregaard · March 26, 2020, 11:30am

Very cute
I also think having _ mean two different things in the same expression is likely to become a real footgun

c42f · March 26, 2020, 1:28pm

There seems to be fairly general agreement that _ should mean _1 when repeated, so here it is.

Since it appears I’m breaking the API already, I guess releasing version 1.0 was premature. Luckily I happen to have a spare 2.0.0 version number lying around which I can use.

https://github.com/c42f/Underscores.jl/pull/3

c42f · March 26, 2020, 2:09pm

In the latest master I’ve thrown caution to the wind and just used _ as the single argument. Also I figured I may as well do the unicode subscripts; it may be a little overly cute but it’s nonintrusive, very easy to implement and generally seems like harmless fun.

haberdashPI · March 26, 2020, 2:30pm

Just my two cents: can’t say I love _ meaning only the first argument. This may just be that I am most familiar with the syntax from Scala, where each _ is replaced with a distinct argument. I think it is mostly a matter of taste. @davidanthoff’s example works just fine as is: e.g.

df |> @filter(_1.colA > 2 && _1.colB<8) |> @map((a=log(_1.colA), b=_1.colB * 2))

I actually think this is clearer than the version using _ alone, especially for a longer line of code like this.

yurivish · March 26, 2020, 3:09pm

My two cents: I agree that _ should mean _1 when repeated. In terms of prior art, this is what Mathematica does.

piever · March 26, 2020, 3:40pm

I also feel that “same visual cue” should correspond to same argument, so _ would always mean _1.

Another typical application here is “lifting”. For example, in the Observables package, to define a value that updates on any change of the inputs, you would do things like:

a, b = Observable(1), Observable(2)
c = map(a, b) do a, b
    (a + b) * a
end

This can be a bit annoying and is not intuitive to users. I wonder whether it makes sense to create a multi argument version that turns @_ map (_a + _b) * _a into map((a, b) -> (a+b)*a, a, b).

It looks like this could be useful also in general, say

v = rand(10)
@_ filter _v > 5

but I’m not sure how readable / intuitive it is.

Topic		Replies	Views
[RFC] PipelessPipes.jl (now Chain.jl) Package Announcements	61	4714	March 25, 2021
[ANN] DataPipes.jl 0.3.0 Package Announcements data , piping	67	7228	November 23, 2022
Partial Application brackets without underscores Internals & Design proposal , currying , partial-evaluation	5	1644	November 21, 2022
Would the Scala convention for anonymous function arguments be feasible? Internals & Design	24	3106	December 16, 2016
Fixing the Piping/Chaining/Partial Application Issue (Rev 2) Internals & Design proposal , piping , chaining , partial-evaluation , threading	40	4074	November 26, 2022

ANN: Underscores.jl: Placeholder syntax for closures

Related topics