Why use gensym in this particular example

I came across the following macro and don’t quite understand why gensym used. I’m sorry about the poor choice in title, but it was the best I could think of. Also, if this is not the sort of question that should be asked due to being too specific, please let me know and I’ll avoid such questions in the future.

From my understanding, gensym “Generates a symbol which will not conflict with other variable names”. That’s take directly from the documentation. Presumably. I understand what it does, but why not just use $rhs. The point is to avoid a case where the variable being defined is $rhs, correct? However, is that a valid variable name and wouldn’t it be evaluated first? I read through the metaprogramming section of the manual and it seems like the compiler would reply the $name and $rhs with their their definitions so this confusion should only happen if they already were the same. However if they were the same… then I don’t see why it matters.

When I read what I just wrote, my own words don’t make a lot of sense, probably because I’m quite confused.

1 Like

This seems like a great question to ask.

There’s a few things going on here:

  • A variable whose name is the generated symbol stored in tmp is created by the $tmp = $rhs line.
  • This variable acquires its value by evaluating the input expression rhs exactly once.
  • After evaluating rhs exactly once, tmp is used twice.

To convince yourself why this matters, imagine that rhs is something like println("Foo") === nothing ? 1 : 2. That’s not even the worse example of what could happen if you tried to use rhs instead of tmp :slight_smile:

7 Likes

So, you are saying that it isn’t clear what the type is until it has been evaluated at some point and that the magic of $tmp = $rhs performs said magic? Your example will always return 2 (unless you mean Nothing instead of nothing). For performance reasons I can see how that would be useful. Is that what you were getting at there?

I also see how if the right hand side is something non-deterministic the whole thing would fall apart, which I’m guessing is what you meant by something worse?

Forgive me if I’m sounding like an idiot, but I’m not a programmer, yet I’m trying to learn Julia for practical usage.

No, that’s not what I mean. Let’s try two versions of the macro above and see what happens, then think about the causal mechanism after you see the outcomes:

julia> macro var_v1(ex)
           name = ex.args[1]
           rhs = ex.args[2]
           tmp = gensym(name)
           esc(quote
               $tmp = $rhs
               ($name)::typeof($tmp) = $tmp
           end)
       end
@var_v1 (macro with 1 method)

julia> macro var_v2(ex)
           name = ex.args[1]
           rhs = ex.args[2]
           esc(quote
               ($name)::typeof($rhs) = $rhs
           end)
       end
@var_v2 (macro with 1 method)

julia> let
           @var_v1(x = println("Foo") === nothing ? 1 : 2)
           println(x)

           @var_v2(y = println("Foo") === nothing ? 1 : 2)
           println(y)
       end
Foo
1
Foo
Foo
1
2 Likes

Is what you are getting at is that $rhs occurs twice so is getting evaluated twice, and so prints Foo twice? If instead of println (which is harmless… I think), we had some function func1!() which made changes to a variable in the scope, this could have unintended and undesirable consequences?

The causal mechanism is that we have two instances of $rhs and each time it occurs we evaluate it.

Also, is what I said about non-deterministic functions true as well, or when you said I missed your point were talking about both?

For example it seems like an expression such as x = rand() > 0.5 ? 1 : "one" could also cause issues. Of course, things like that should be avoided anyway since it’s inherently type unstable; I’m just using it as an example. Yeah, @var_v2 does not like that. But it’s a pathologic example, and would (hopefully) never be used for real so I’m sure that isn’t related to what you are getting at.

By the way, thank you for your patience.

2 Likes

Yes, that’s the core issue here. In general, macros should not (unless they make clear that they’ll do so) take in an expression once and then evaluate it multiple times. That would never happen with a function (since the arguments are eagerly evaluated before being used), so doing it in a macro is likely to confuse users who aren’t expecting it.

If you subscribe to that idea, you have to evaluate the expression exactly once and store the result into a temporary variable. Once you accept that requirement, you need to use gensym because you want to be sure that the temporary variable you create will never clash with any existing variables.

If instead of println (which is harmless… I think),

I think experienced people view this sort of thing as fairly harmful as a matter of principle even if there’s cases where it’s fine. What if println requires network traffic to print the message to a remote logging service? You’ve just doubled the costs in terms of network usage and maybe even storage capacity in that case. That’s not great unless you really wanted that effect.

Also, is what I said about non-deterministic functions true as well

Yes, this is also true. What I meant was that your question about type inference wasn’t the issue – the issue here is just about the number of times the expression is evaluated.

For example it seems like an expression such as x = rand() > 0.5 ? 1 : "one" could also cause issues

This could be an issue for other reasons, but those reasons aren’t special to macros or gensym so I didn’t want to touch on them. Again, the issue here is the way that wanting to control the number of times an expression is evaluated leads to wanting to use a temporary variable and that leads to wanting a guarantee that the naming system for your variables never introduces clashes with other names.

2 Likes

Won’t macro hygiene take care of it? The original macro seems to just escape too much, it could, for example, be written as

macro var(ex)
    if !(ex isa Expr && ex.head == :(=))
        throw(ArgumentError("expression should be of the form `var = value`"))
    end
    name = ex.args[1]
    rhs = ex.args[2]
    quote
        tmp = $(esc(rhs))
        $(esc(name))::typeof(tmp) = tmp
    end
end

with the expansion:

julia> @macroexpand function f(x)
           @var y = x
           y = 1.2
           return y
       end
:(function f(x)
      var"#3#tmp" = x
      y::Main.typeof(var"#3#tmp") = var"#3#tmp"
      y = 1.2
      return y
  end)

and tmp in the macro won’t clash with any local tmp.

5 Likes

Ah, I think I see what you’re saying. You’re saying that such a macro makes an unintended and fundamental change to the expression. Macros should only make changes towards the end goal. This ensures that only the intended changes are made. Is that it?

Right, that makes sense. Question though, why take name as an argument? The help documentation doesn’t explain what [tag] is and the manual just states " Local variables are then renamed to be unique (using the gensym function, which generates new symbols), and global variables are resolved within the macro definition environment." Does this just ensuring it generates a symbol different from whatever is in name rather than one that is wholly unique to the scope? If I use gensym() won’t that accomplish the same end goal?

Trying this out, it seems it’s just to help human readability when doing a @macroexapand. Is this correct?

Ah, gotcha. Thanks.

1 Like

Yes, it will accomplish the same goal. This seems to be more along the lines of programmer convenience. using for example gensym(:someName) will result in a symbol like Symbol("##someName#363") which gives some clue to what the symbol is for when you use @macroexpand to check that the macro is generating what you expected. Without that you would have ended up with a symbol like Symbol("##364") which is more difficult to read.

1 Like

Yeah, you’re totally right: I was acting like esc had to be there. But the hygiene pass will create the gensym result anyway if the esc is targeted better.