Would the Scala convention for anonymous function arguments be feasible?

In Scala there is a convention that the function argument to functional programming operations like map can be an expression that uses an underscore. The convention is that the anonymous function in a Julia call

sum(x -> x < 1.0e-5, rand(10000))

could be written as

sum(_ < 1.0e-5, rand(10000))

without the need to write _ -> at the beginning of the expression.

I’m just throwing this idea out there. I write a lot of anonymous functions in such functional programming constructs and it would be convenient to use expressions like this instead. However, I have no idea how difficult it would be to implement this or how offensive such a convention might be to others.

5 Likes

See Julia issue #5571 for some old discussion on this idea.

I would like to see it happen.

I’m still sympathetic to this idea (don’t think @jeff.bezanson cares for it) – the main hang up is to determine how far the expression that is the body of the anonymous function extends. There was some discussion in that thread, but no one really had a plausible proposal. IIRC, Scala uses types to determine this, which is not possible for Julia and arguably not a great idea in the first place. Other languages with similar features have some kind of syntactic indication of where the function expression stops. My best attempt at proposing an answer to that question starts here.

1 Like

I had a conversation yesterday where @johnmyleswhite brought up how Hack’s pipeline |> operator works. Once we disallow _ as a normal r-value so that it can be treated specially as an l-value (i.e. it discards its value like ~ as an l-value in Matlab), then we could use it like Hack’s $$ with the |> operator, i.e.:

f() |> g(_) |> h(5, _)          # h(5, g(f()))
f() |> g1(_) + g2(_) |> h(5, _) # x = f(); h(5, g1(x) + g2(x))

Using _ like this doesn’t have a problem with knowing how far the expression extends since the expressions are explicitly delimited by |> operators. When there are no instances of _ in an expression on the RHS of a |> it could continue to mean to apply the value of that expression to the LHS value, which would make this change backwards compatible with the current meaning (except for code that has overloaded |> to mean something else), which makes the first example even shorter:

f() |> g |> h(5, _) # h(5, g(f()))
4 Likes

Am I right to think that implementing a pipeline operator using _ would preclude using it in anonymous functions?

Not entirely, since one behavior applies when the |> operator appears and the other where it doesn’t, but having two such special meanings would be confusing. In any case, once _ can’t be used as an r-value, any of these are up for grabs, so we can take our time deciding what meaning (if any) we’d like to have. The advantage of the Hack-like proposal is that it’s already pretty well fleshed out, whereas no one has a really compelling idea for how the Scala-like thing would work.

1 Like

Note that this is exactly how chaining works in ChainMap.jl. I’ve disallowed in-line versions for now; there’s just block syntax which I think is easier to read

@chain begin
    f()
    g
    h(5, _)
end

Also, @chain attempts to be smart; you can apply it to whole blocks of global code and it searches through for begin blocks where chaining would be feasible and chains only them.

I think if I was going to make it into Base, I’d suggest syntax like this:

chain
    f()
    g
    h(5, _)
end
1 Like

Also, to implement dmbates’ proposal in ChainMap syntax:

@chain begin
    10000
    rand
    @over ~_ < 1e-5
    sum
end
1 Like

I really like the block syntax of @chain, though I see that people who use this often probably would like the inline syntax. But for the inline case, I find that using an operator like |> makes it clear that chaining is going on, whereas the Scala-like syntax I find to be a little opaque. But that’s just my perspective as someone who doesn’t chain functions / do functional programming often (if at all).

Also, something to keep in mind, if I just saw random _s around, I would not know where to look in the documentation for that. But |> and @chain are easy to Google search. Making something not only easy to read but also easy to find documentation is something I think is crucial if it’s a niche part of the syntax.

2 Likes

I am continuing to maintain a summary post in Function chaining issue:
Quoted Below:

So our current list of various efforts etc
I think it is worth people checking these out, (ideally before opinioning, but w/e)
they are all slightly different.
(I am attempting to order chronologically).

Packages

Nonpackage Prototypes

Related:


Perhaps this should be editted in to one of the top posts.

updated: 1/10/2016


Personally I really like the way @StefanKarpinski was going with https://github.com/JuliaLang/julia/issues/5571#issuecomment-157424665

Which is way I implemented a macro-based prototype with it.

Which I hoped would let people expriment with it and get a feel for it, and see if it was a nice syntax.
However no further discussion emerged from that prototype.

5 Likes

This will probably be unpopular, but to me it would seem clearer if the _ became mandatory in chained expressions (rather than backwards compatible)., e.g. the middle (_) becomes mandatory in f() |> g(_) |> h(5, _).

Yes, less backward compatible and less concise, but more importantly (to me), less magical. Otherwise the expression jumps back and forwards between different conventions, and it’s a bit jarring.

Another idea floated was to have _ defined like ans is in the REPL right now - it is a binding to the output of the previous expression. So you could just type

f(); g(_); h(5,_)

without the |> (or write it over multiple lines, if you like). Then change REPL ans to _, and there’s less for users to learn, and less magic all around.

5 Likes

How does the ans part interact with nesting? What is the behavior of the following code?

f(); g(h1(_), h2(_));

To me the problem with ans is that it involves mutation of global state, so the order of execution of complex sub-expressions becomes a defining property of every program.

2 Likes

Sorry, I’m a bit confused what you are getting at. Don’t you have the exact same problem with function chaining (and Julia in general) currently?

E.g… what does this do?

f() |> _ -> g(h1!(_), h2!(_))

or even:

x1 = f()
x2 = g(h1!(x1), h2!(x1))    # I've never seen anyone write such a thing

I definitely see this syntax as being clearer in a functional paradigm, where _ or ans isn’t being mutated, and it is perfectly clear. But you would have this same problem using ans at the REPL. Julia does pretty well at encouraging functional approaches, and (just speculating here) having more convenient syntax like this might only make that easier and more common.

I think there’s an important distinction between |> setting _ and ans becoming _: in function chaining, we agree that _ is invariantly bound to the value of the left-hand side of that operator, whereas the ans solution requires you to clarify what the “last top-level (sub-)expression” means.

It’s a problem of ambiguity about what binds to what exacerbated by actual global state in the REPL. Of course you could remove that ambiguity, but it’s hard to imagine that any rule you invent could be more clear than how |> works.

requires you to clarify what the “last top-level (sub-)expression” means.

I think it might be simpler than you are worrying about.

Because return is optional, Julia users have to learn these rules anyway. It already almost feels like a part of the language that the last expression has some meaning because of the way functions terminate, so in many ways this would just be formalizing this somewhat. I think it is pretty easy to understand what the last expression is (I think all you need to know is this: for / while loops always return nothing so you don’t have to parse inside of an immediately proceeding loop; blocks (such as with begin) return their last expression just like a function and so can be considered to be “flattened”; and finally there are similar rules for if that make it compatible with the ternary. Beyond these, there is no “subexpression” in the sense that a+b+c should rebind _ in the middle of its operation to a+b or something - expressions (in this context) end wherever you can insert a semicolon ; with no effect).

It’s a problem of ambiguity about what binds to what exacerbated by actual global state in the REPL. Of course you could remove that ambiguity, but it’s hard to imagine that any rule you invent could be more clear than how |> works.

I agree that |> is perfectly clear and logical the way proposed here. I just think automatically binding to _ uses the same rules that Julia users have to learn anyway to understand functions. I’m suggesting that this wouldn’t cause any additional ambiguities or global statefulness concerns beyond what already exists.

That’s all fair.

On my end, it seems like your point is that every sequence of expressions can be viewed as a function chain in which most of the time the previous expression’s value isn’t being passed forward – but you could always pass it forward as ans if ans were set to the value of the last executed expression. I agree with that interpretation, but think that the resulting syntax wouldn’t really clarify the flow of data over what we have today.

For example, I’d be pretty unhappy if people wrote code like the following:

# Do step 1
foo(1)

# [...]
# [...]
# Now, after some long deliberation, we're going to do step 2.
bar(ans)

Of course, you can write ugly code using |> as well. Ultimately my preference for Hack’s |> is based on a subjective sense that the syntax suggests data flow in a way that the use of ans wouldn’t.

2 Likes

I think we all agree that it has potential to be misused. :slight_smile:

For my use cases, I rarely use |>, but I would love to see function chaining become more powerful. I’m sure that both approaches could make sense (separately or together), but I was worried that |> with optional _ means there is a bit more desugaring happening by the parser that users need to understand (for some reason complex desurgaring irritates me, e.g. the very powerful do syntax is useful but annoys me the way it inserts things into functions, similar to current behaviour of |>, while mandatory _ would be clearer IMO. A similar trick to explicitly label where do inserts the function would be nice, but this might be personal preference).

Getting rather off-topic, sorry, but here’s a do idea that combines some of the above:

# existing syntax
map(array) do element
   2*element
end

# replace with <| (need to be defined)
map(_, array) <| function (element)
    2*element
end

# and make `do` be sugar for the above
map(_, array) do element
    2*element
end

(note that having <| doesn’t combine with _ as a REPL-like ans, since it reverses the direction in a strange way)

The advantage here is it’s clearer what kind of method of map is being called, and that _ doesn’t have to be the first argument.

Here’s a short macro that implements the _ as ans syntax:

using MacroTools

store_line(e) = 
    isexpr(e, :line) ?
        e
        :( _ = $e )
    
store_block(e) = 
    isexpr(e, :block) ?
        MacroTools.walk(e, store_line, identity) :
        e
        
macro store(e)
    esc( MacroTools.prewalk( store_block, e) )
end

2 == @store begin
    function a(b)
        b
        _ + 1
    end
    1
    a(_)
end

I’m concerned about the impact that having a potential implicit data dependency between an expression and the next would have on optimization and code analysis. But then I suppose that you can tell pretty easily if the value of a top-level expression in a block needs to be captured or not: does the following expression in the block have _ anywhere in it. This approach would also make multiple streams of computation fairly clear:

f1(), f2()
g1(_[1]), g2(_[2])
h(_...) # h(_[1], _[2])
k1(_), k2(_)

Definitely something to ponder.