ANN: Underscores.jl: Placeholder syntax for closures

I’m happy to announce Underscores.jl, a small but general package for _ placeholder syntax which makes it easier to pass anonymous functions to other functions.

The package provides a single @_ macro which is helpful in writing anonymous functions succinctly without naming the arguments. Some examples:

Expression Meaning
@_ map(_+1, a) map(x->x+1, a)
@_ map(_+2^_, a) map((x)->x+2^x, a) (NOTE)
@_ map(_2/_1, a, b) map((x,y)->y/x, a, b)
@_ func(a,__,b) x->func(a,x,b)
`@_ data > map(_.f,__)`

(Edit to add (NOTE): this changed from the original meaning of (x,y)->x+2^y as discussed below.)

A lot of packages have done similar things before, but Underscores takes a slightly different approach: the @_ is attached to the function receiving the closure rather than delimiting the start of the closure itself.

It turns out this outer placement is really convenient because

  • It reduces the need for extra parentheses: @_ map(_+1, a) rather than map(@_(_+1), a)
  • It is fairly conceptually consistent to allow it to act on a whole data pipeline, which makes it rather handy for tabular data manipulation.

This package was stimulated by the excellent discussion over at https://github.com/JuliaLang/julia/pull/24990, and elsewhere.


As a more concrete example, this provides natural syntax sugar for the functional approach to data manipulation taken in SplitApplyCombine.jl (@andyferris you may find this fun):

julia> using TypedTables, Underscores, SplitApplyCombine

julia> t = Table(name = ["Alice", "Bob", "Charlie", "Eve"],
                 age = [25, 42, 37, 45],
                 sex = ["female", "male", "male", "female"])
Table with 3 columns and 4 rows:
     name     age  sex
   ┌─────────────────────
 1 │ Alice    25   female
 2 │ Bob      42   male
 3 │ Charlie  37   male
 4 │ Eve      45   female

julia> @_ t |> 
          filter(_.age > 27, __) |>
          group(_.sex, __) |>
          map(length, __)
2-element Dictionaries.HashDictionary{String,Int64}
   "male" │ 2
 "female" │ 1
35 Likes

Just curious, how does Underscores.jl compare with LambdaFn.jl (https://github.com/haberdashPI/LambdaFn.jl)?

1 Like

Is Underscores.jl compatible with FastClosures.jl? That would be cool!

The packages attack the same problem, but differ in the placement of the macro. As I noted above, the @_ is attached to the function receiving the closure rather than delimiting the start of the closure itself. (Though note that the double placeholder __ acts the same as the LambdaFn single placeholder.) See also A way to allow for `filter`, `map` piping... · Issue #1 · c42f/Underscores.jl · GitHub)

It’s not compatible with FastClosures (at least, not yet); I’m not sure how that could be made to work in a natural and composable way. I guess one option would be to declare that every closure made by Underscores.jl is a fast closure. For the typical use of Underscores.jl that might actually be ok but it does departs from the normal Julia scoping rules. Another option might be to generalize the types of expressions that FastClosures deals with somehow… maybe add a special rule for underscores

1 Like

Thanks for the explanation!

I like almost all of it, except that a second use of _ turns things into a two argument function. That makes it hard to combine with the core Query/SplitApplyCombine story. Here are some super common things in those packages that don’t work with this pattern: df |> @filter(_.colA > 2 && _.colB<8) |> @map((a=log(_.colA), b=_.colB * 2)). I would just remove that, especially given that there is already a nice syntax for multiple arg functions with _1 etc.

10 Likes

This package is big step forward in resolving an approach to _ imo.
I agree that the _1 _2 syntax is nice, and with that there is no requirement for another way to express “something and another thing.” As I recall, typically the use of _ in several places within an expression indicates the repetition of something rather than the introduction of another thing unless, quite clearly, it does not (binaryfn(_,_)). So I concur with the view that more than one _ be treated as copies of the same thing (e.g. _1s).

7 Likes

I like almost all of it, except that a second use of _ turns things into a two argument function. That makes it hard to combine with the core Query/SplitApplyCombine story. Here are some super common things in those packages that don’t work with this pattern: df |> @filter(_.colA > 2 && _.colB<8) |> @map((a=log(_.colA), b=_.colB * 2)) . I would just remove that, especially given that there is already a nice syntax for multiple arg functions with _1 etc.

I agree with this

The (potential) problem is that unlike _, _1 is actually a valid identifier so using _1, _2 for a 2-argument function is more likely to cause weird issues. Doesn’t matter too much in a macro though, but it’s worth keeping in mind as a potential pain point.

2 Likes

This is an interesting point. The current behavior came from copying RFC: curry underscore arguments to create anonymous functions by stevengj · Pull Request #24990 · JuliaLang/julia · GitHub but that didn’t have named placeholder syntax as an alternative for multiple arguments.

Anyway, I agree that having _1 and _ mean the same thing is a lot better for the tabular data scenario and I’m inclined to make this change.

But what about more general cases? I’d greatly appreciate if people can try this out for real code and non-tabular data and report back with their examples and impressions.

3 Likes

One case current multi-arg version could be useful was currying merge

dicts = [Dict(zip(rand('a':'e', 3), rand(0:99, 3))) for _ in 1:10]
@_ reduce(merge(+, _, _), dicts)

However, we’ll have mergewith in Julia 1.5 so it can be done with reduce(mergewith(+), dicts).


Edit: Similar example:

vecs = [randn(3) for _ in 1:10]
@_ reduce(_ .+ _, vecs)
1 Like

@_ is fun! This is another usecase:

extrema_(xs) = @_ mapreduce(
    (min = _1, max = _1),
    (min = min(_1.min, _2.min), max = max(_1.max, _2.max)),
    xs,
)

which is apparently faster than Base.extrema :slight_smile:

julia> xs = randn(1000);

julia> @btime extrema($xs)
  5.673 μs (0 allocations: 0 bytes)
(-3.270990011151266, 3.254351333498243)

julia> @btime extrema_($xs)
  3.258 μs (0 allocations: 0 bytes)
(min = -3.270990011151266, max = 3.254351333498243)
5 Likes

I don’t think this is an issue for a macro-based solution: people have to opt-in to using the macro so it can’t break any existing code. If there was a viable alternative we could consider it, but I think _1 and _2 etc is the most obvious and succinct syntax for this.

It occurs to me that a cute addition might be to allow superscripts or subscripts as synonyms for the pure-ascii version:

@_ map(_2/_1, a, b)
@_ map(_₂/_₁, a, b)
@_ map(_²/_¹, a, b)

I like the subscript version to look at, though I probably wouldn’t bother using it myself. More cute than useful, perhaps?

8 Likes
 _extrema(x) = minimum(x), maximum(x)

is faster than Base.extrema as well.

3 Likes

Very cute :slight_smile:
I also think having _ mean two different things in the same expression is likely to become a real footgun

9 Likes

There seems to be fairly general agreement that _ should mean _1 when repeated, so here it is.

Since it appears I’m breaking the API already, I guess releasing version 1.0 was premature. Luckily I happen to have a spare 2.0.0 version number lying around which I can use.

https://github.com/c42f/Underscores.jl/pull/3

13 Likes

In the latest master I’ve thrown caution to the wind and just used _ as the single argument. Also I figured I may as well do the unicode subscripts; it may be a little overly cute but it’s nonintrusive, very easy to implement and generally seems like harmless fun.

1 Like

Just my two cents: can’t say I love _ meaning only the first argument. This may just be that I am most familiar with the syntax from Scala, where each _ is replaced with a distinct argument. I think it is mostly a matter of taste. @davidanthoff’s example works just fine as is: e.g.

df |> @filter(_1.colA > 2 && _1.colB<8) |> @map((a=log(_1.colA), b=_1.colB * 2))

I actually think this is clearer than the version using _ alone, especially for a longer line of code like this.

1 Like

My two cents: I agree that _ should mean _1 when repeated. In terms of prior art, this is what Mathematica does.

2 Likes

I also feel that “same visual cue” should correspond to same argument, so _ would always mean _1.

Another typical application here is “lifting”. For example, in the Observables package, to define a value that updates on any change of the inputs, you would do things like:

a, b = Observable(1), Observable(2)
c = map(a, b) do a, b
    (a + b) * a
end

This can be a bit annoying and is not intuitive to users. I wonder whether it makes sense to create a multi argument version that turns @_ map (_a + _b) * _a into map((a, b) -> (a+b)*a, a, b).

It looks like this could be useful also in general, say

v = rand(10)
@_ filter _v > 5

but I’m not sure how readable / intuitive it is.

4 Likes