Question on the Design of Macro Hygiene

I am new to Julia so I am watching some tutorials. When I am watching a tutorial on Julia on the metaprogramming I see an example on the macro hygiene.

macro dummy(ex)
    return ex

xx = 3
@dummy yy = xx ^ 2 # print 9
yy # undefvalerror: yy not defined

My question is about the design of this feature. In this case, it seems to me intuitive that, given the fact that the user (who calls the dummy macro) has handed the variable name yy to the macro, he should himself be responsible for the naming conflict and stuffs. By this, I mean as the dev who writes the macro, we are probably supposed to (or are permitted to) mess up with this yy name as much as we want to fulfill the functionality. That’s the reason why the user gives us yy instead of a_very_obscure_name_that_he_will_not_use_anyway.

It is clear for me that local variables within the BODY of the macro should comply to the hygiene principle, but my question is that why do we need to keep the principle for the user-provided variable names as well? What are the potential problems if we did it otherwise?

I notice that Julia actually translates xx to Main.xx. My suspect is that there might be some implementational difficulty in creating a variable that is bound to the Main scope. I am not sure.

If you haven’t already read the relevant paragraph in the Julia manual, I would recommend you give it a look. I think it does a pretty good job explaining why hygiene is needed and the rationale behind the way it is handled by the compiler.

Looking at your example, but in a slightly different context where the macro is invoked in a local scope:

julia> macro dummy(ex)
           return ex
@dummy (macro with 1 method)

julia> let xx = 3
           @macroexpand @dummy yy = xx ^ 2
:(var"#1#yy" = Main.xx ^ 2)

we see that there are actually two issues here:

  1. as you observed, yy has been gensymmed by the compiler; this is because it was considered a local variable in the context of the macro
  2. maybe more subtly, xx refers to Main.xx even though the xx variable is scoped in the let block in the context of the macro call. This is because this variable was considered a global variable in the context of the macro expansion.

My understanding is that the choice made in Julia is to maybe err on the side of caution, by automatically handling hygiene using these rules. If you, as a macro developer, want to disable automatic hygiene on some part of the generated expression, you can do so using the esc function (and in many cases you actually should disable automatic hygiene handling for user-provided expressions).

julia> macro dummy(ex)
           return esc(ex)
@dummy (macro with 1 method)

julia> let xx = 3
           @dummy yy = xx ^ 2
           @show yy

           @macroexpand @dummy yy = xx ^ 2
yy = 9
:(yy = xx ^ 2)

On a personal note, having spent a fair amount of time developing macros in (Emacs)LISP where you have to manually take care of gensymming everything, I can say that always having to think about hygiene puts an extra burden on the macro developer. So I kind of like Julia’s approach (although it did take me some time to get used to it).


Thank you for the answer! Very helpful.

But I still am confused at why the default is to not escape instead of to escape. Would you mind pointing out some cases that using yy and xx (instead of Main.xx and var"#123#yy") might raise confusion?
(And the documentation does not seem to help with my question.)

I think from the doc they divides variables into 2 kinds, local and global. It seems to me what would have been better is into 3 kinds, local, macro-definition scoped, macro-caller scoped. Local should be local, macro-def should trace back what it refers from the definition scope, and macro-call should go back to the caller scope. My thought is that local vars are the ones that should comply to the Hygiene principle.

But from your answer, maybe this (the Julia way) is more natural to work with? (I am never-have-lisped -er so I don’t have experience with this feature). Hope my question make sense.

Not sure I know all the details here, but my understanding is that it can be extremely difficult to make the distinction between what actually comes from the user-provided expression and the macro itself.

That distinction might not even make so much sense for more complex macros. Take for example the macros from Base.Cartesian:

julia> @macroexpand Base.Cartesian.@nexprs 4 i -> x_i = i
    x_1 = 1
    x_2 = 2
    x_3 = 3
    x_4 = 4

None of the x_1, x_2, … symbols actually come from the context of the macro call; they were all built by the macro itself, and are going to be injected as new variable names in the context of the macro call. In such a case, you can’t really “trace back” what x_i refers to in the macro call.

Julia’s take is to be very flexible. It handles hygiene in a way that can be automated and is sensible in a lot of common cases. And it also allows you to handle everything manually by:

  • using esc to disable auto hygiene, and
  • using gensym explicitly to inject symbols that won’t clash in the macro output.

Not sure if that makes more sense, but at least that’s how I see things from my user point of view.

1 Like

This seems like an instance of what I wanted to talk about as problem. The eval searches for df from the global scope. Admittedly this might not be the best usage of eval but intuitively to users who know not that much detail of its impl, it should have worked (at least that’s my understanding of eval in Lispy things (like Python))
(Edit: i was wrong, in Python eval is also defaulted to global scope, but the workaround (that seems general enough) is to actually specify the scope by adding the output of built in function locals())

Your example of nexprs, my first thought being, belongs to the case that needs special treatment. They are to me more like C preprocessor style than the all powerful Lisp macro, emphasizing on the textual part of meta programming. And the users who are calling this macro are obviously actively seeking a shortcut to declare variables (as such, what they want is a textual macro), thus there will be much less confusion.

In contrast, when it comes to eval, upon hearing its functionality, we would have expected it to operate on the expression level. Evaluation of expression at here there should be equivalent to directly running that expression here there. The default behavior of such macro seems natural to comply to such intuition. We should not have needed to change df to Thisfunction.df or the like to refer the exact same thing.

Another thought is that it is easier to inspect outwards than inwards in scope, due to the arborescent nature of code (the out direction is always unique while the in one is not). It seems better to have the innermost scope information preserved by default. Otherwise it would require hard-coding the inner scope as workaround and thus lose generality and reusability.

That said, am I overlooking something about eval or is it just not well designed as built-in macro, or there is some merit in this design choice? I would love to see examples that make me learn more about this.

Early on (2014), there was a proposal to do just this — basically, only symbols defined inside the macro should be hygienized, whereas symbols coming from macro arguments should be left alone, eliminating the need for esc: RFC: Improvement to hygienic macros by david-moon · Pull Request #6910 · JuliaLang/julia · GitHub

The Julia developers were pretty receptive to the idea, but the PR ran into some technical problems and languished for lack of attention. It’s the sort of thing that might still be considered for Julia 2.0.