RFC: _ as ans

There’s been a lot of discussion lately of using _ as stand in for the result of the previous evaluation, for example:

a
_ + 1
_ + 2

would get lowered to

gensym1 = a
gensym2 = gensym1 + 1
gensym3 = gensym2 + 2

I’ve got something close to this up and working in ChainRecurisve.jl in a surprisingly small amount of code. There are only two remaining issues:

  1. Base syntax that is parsed as a block but would need to special case opt out:
type a
    b::Int
    c::Int
end

does not mean

type a
    gensym1 = b::Int
    gensym2 = c::Int
end

and

(args...; kwargs...) -> f(args..., kwargs)

does not mean

(begin 
    gensym1 = args...
    gensym2 = kwargs...
end) -> f(args..., kwargs)
  1. Macros that use block syntax for things that aren’t true blocks. For example:
MacroTools.@match e begin
    a_ + b_ => :($b + $a)
    a_ => a
end

does not mean

MacroTools.@match e begin
    gensym1 = a_ + b_ => :($b + a)
    gensym2 = a_ => a
end

Part of this can be solved in a macro. 1) can be solved using special casing. 2) however cannot because the possibility of user defined macros. Therefore, I think for this to gain widespread usability, it needs to be implemented in lowering, which I gather occurs after macro expansion. I think this is the same way dot vectorization was implemented. I’d like to implement a PR, but I was hoping if anyone is excited about this they could either give me some advice to get started or collaborate? I don’t know lisp so it would be an uphill climb.

2 Likes

I was not aware of this discussion, can you link it?

I tend to use _ as a stand-in for “values I don’t need”, eg

a, _, c = returns_three_values(...)
1 Like

The way forward to make that official has been paved by deprecating the use of _ as an r-value in 0.6. It will likely be an official “discard this value” name in 1.0 when used as a l-value. That does not conflict with the proposed use here as an automatic “last value” binding, which would be handy for chaining computations together.

EDIT: To clarify, there has not been that much discussion of _ as a previous value binding, most of the discussion has been about _ as a discarded value binding.

So it’s possible that if _ is used as an l-value it would mean “discard this value” but it used as a r-value it would mean “last value”?

In fact, these use cases are entirely consistent. For example,

plus_neighbor(i) = i, i + 1

begin 
    1
    _, b = plus_neighbor(_)
    b + 1
end

would presumably get lowered to

plus_neighbor(i) = i, i + 1

begin 
    gensym1 = 1
    gensym2 = (gensym1, b) = plus_neighbor(gensym1)
    gensym3 = b + 1
end

And in fact this already works in ChainRecursive

using ChainRecursive

@chain begin
    1
    _, b = plus_neighbor(_)
    b + 1
end

As Stefan says, there actually hasn’t been much (if any) discussion of this. The discussion has been for using _ for discarded l-values.

I’m very skeptical of this proposal for _ as an r-value denoting the result of the previous expression. What problem does this solve? Can you give an example of code that would be made significantly clearer by using _ to denote the result of a previous expression?

2 Likes

Hmm. Well, I’ve seen it mentioned a couple places, maybe I was exaggerating a bit. I’m not sure I want to rehash the discussion of the pros/cons of chaining; I was more looking for advice on how to implement one version of it.

Assigning a symbol different meaning depending on context would lead to subtle bugs IMO. Generally, I don’t think that using values of previous expressions via some special syntax is good practice; I recognize that ans is useful in the REPL (in case the computation is expensive and I forgot to assign the result to something, but this almost never happens), but I would prefer explicitly assigning and using values.

However, in case you think this is useful, it would be great to see an example of a language which does something similar. I have to admit that I can’t think of any.

1 Like

In fact, I tried to show above that the semantics of _ as “discard this value” are a strict subset of the semantics of _ as “last value”. Chaining is wildly popular in R and used extensively in the Hadleyverse.

No it’s not. What are you going to do with

_, a = (1, 2)
b = _

Should _ be the last value, or the discarded value

What about

c = _, a = (1, 2)
b = _

Discarding a value has the property that you cannot misuse it, since it’ll be an error (currently a warning) if you do. A magic “last value” does not have that property and is much more error prone.

3 Likes

Hmm, I think the behavior is pretty consistent.

_, a = (1, 2)
b = _

would go to

gensym1 = nothing
gensym2 = (gensym1, a) = (1, 2)
gensym3 = gensym2
c = _, a = (1, 2)
b = _

would go to

gensym1 = nothing
gensym2 = c = (gensym1, a) = (1, 2)
gensym3 = b = gensym3

This is exactly the problem. It’s perfectly valid to assume that it’s

(gensym1, a) = (1, 2)
c = gensym1

instead.

1 Like

Hmm. I think the first behavior seems more intuitive.

The second one (I assume this means the one in my post above) is much more consistent with using it as a normal variable.

Isn’t _ as an rvalue deprecated?

The deprecation does free it up for being used as result of previous statement when used as rvalue which is what @bramtayl has been arguing for. The case I posted is just for showing that having it both as discarded value and value of previous statement are in principle non-conflicting but can be quite confusing.

1 Like

I can see your point, it does seem confusing. I think the potential for a user to use _ in the line directly after an assignment is small. If users are assigning names to something, they’ll likely be using these names instead.

But the point is that allowing _ as both and r-value and an l-value but one which behaves completely differently from other variables is wildly confusing and adds a ton of corner cases to the language.

2 Likes

If you are thinking of the %>% operator in R, that is quite different from what you are suggesting, and equivalent or similar functionality is already provided by |> and some packages, in particular see

Like I showed above, only two corner cases in Base, each of which is caused by Julia using block syntax for lists/arguments. In fact, I was kicking around another proposal to add a new “arguments” block syntax/AST node which would allow you to pass arguments to a function in block form. This alternate proposal would allow macros to work correctly and remove the need for chaining behavior in Base.