Programmatically defining a global variable in a new module

I need to be able to programmatically define a new global variable in a module. Till Julia v1.11 I was able to do this as

function define_global()
    m = Module(:MyModule)
    Core.eval(m, :(global data))
    setglobal!(m, :data, 123)
    m
end

This, however, does not work anymore and raises an error:

julia> define_global()
ERROR: Global MyModule.data does not exist and cannot be assigned.
Note: Julia 1.9 and 1.10 inadvertently omitted this error check (#56933).
Hint: Declare it using `global data` inside `MyModule` before attempting assignment.
Stacktrace:
 [1] define_global()
   @ Main ./REPL[7]:4
 [2] top-level scope
   @ REPL[8]:1

This, presumably, is a consequence of the new world-age mechanism. In fact, the code below with invokelatest solves the problem:

function define_global(name)
    m = Module(:MyModule)
    Core.eval(m, :(global data))
    invokelatest(setglobal!, m, :data, 123)
    m
end

Is there are nicer and cleaner way of doing what I am trying to do?

1 Like

Just to elaborate, I believe this is associated to the new world age mechanism, which in Julia 1.12 is also applied to global variables. This is creating several issues for me, since in my code I need to define and use, depending on user’s input, a new module together with new function definitions and global variables. I can still do this in 1.12, but only at the top-level in the REPL.

That workflow sounds kind of cursed to me. Could elaborate a bit what you want to achieve? Perhaps there is more idiomatic to your underlying issue.

3 Likes

I am doing Bayesian inference in the parameters of a fairly complex physical model. The model is specified by the user in a (YAML) configuration file. That file is loaded and translated into Julia code by a routine I wrote. The user-generate Julia code is all contained within a module, which includes all routines necessary to do inference (therefore to computer the likelihood, the prior and so on), together with user-set parameters and external data. The user-set parameters and external data are stored in the module as global variables.

Interesting stuff!

Am I guessing correctly that the yaml interface is due to external constraints? Otherwise a native Julia interface with some convenience macros woulde likely be a more streamlined experience - both for you and the users of your code.

Could you simply(?) wrap you generated code into a function and provide the parameters via function arguments? That also should be a lot faster since global variables are slow (especially if untyped). Perhaps you’d need to generate both a struct for holding parameters and some functions for computations.
Can I find your code that does this translation somewhere?

Btw: do you know Turing.jl? To my knowledge that does Bayesian inference and has a nice DSL for specifying models that then generates efficient Julia code.

2 Likes

Thank you for your resposes @abraemer !

Yes, in a sense is due to “external constraints”: many users of the code are not experienced Julia programmers and it would be complicated for them to use the code properly.

The code is already largely wrapped within functions. In fact, it is composed of two main parts: a library, which is “static” and does not depend on the user input, and a user interface, responsible of putting together the calls to the library depending on the user input. The latter is the problem of course, because it is at this level when I need to collect all the user input, find out the parameters to use for the inference and the constant quantities, and decide which functions are needed. At this stage I also take out of the inference constant quantities (i.e., I precompute them and store in an external constant structure passed to the likelihood).

Macros anyway have been my first attempt when I started this project a couple of years ago, but things with macros get very quickly quite confusing, especially if one needs to use macros within macros (which is something I had to do at a certain point).
However, following your kind suggestion, I will reconsider if there is a way to implement things with a macro.

I know Turing.jl and I have been trying to use for my purposes in the past, but with no success at all. It is certainly a very interesting project and I can see a lot of uses for it, but probably not for what I am trying to do (modeling of strong gravitational lenses), as the complexity of the systems and the number of free parameters is really large. For this reason it is really vital to have an highly optimized code (and yes, I do not use untyped global variables and all my critical functions do not have dynamic dispatch).

Thanks for your explanations and sorry for not having much of an answer so far. I guess I am still a bit confused we you need/use global variables and cannot pass the values via function arguments. To make this more explicit: You could generate a stuct that just includes every parameter you need and then pass that to every function. Then instead of accessing the global, you just access the struct’s slot. I don’t see why that should not be possible.

Also just in case it escaped your attention so far: there are so-called generated functions which fill a somewhat different niche than macros but can be very powerful for code generation. One way to implement a ‘compiler for yamls’ could be to parse the yaml and from the data build a type structure which you then feed into a generated function which constructs the actual code to be executed from the information in the type.

2 Likes

Do it in one step:

@eval m global data = 123

It doesn’t need to be one step, you can eval separately. The problem is non-eval access of the properties/globals of the module object at an obsolete world age after the evals and before leaving the top level expression to reach the updated world age. In other words, world age now applies to globals of modules like methods of functions.

julia> function define_global()
           m = Module(:MyModule)
           Core.eval(m, :(global data))
           Core.eval(m, :(data = 123))
           println(Core.eval(m, :data))
           m
       end
define_global (generic function with 1 method)

julia> m = define_global()
123
Main.MyModule

julia> m2.data
123

julia> define_global().data
123
ERROR: UndefVarError: `data` not defined in `Main.MyModule`
The binding may be too new: running in world age 38673, while current world is 38675.

I also think the exact line of leaving a top level expression changed slightly, but that won’t matter from inside a method.

1 Like

Thank you again @abraemer . Sure, I could use a struct and pass it to every function, but since I can still use the “trick” of invokelatest in my define_global I think that there is no particularly strong reason to rewrite a substantial part of the code.

Also, as @Benny writes, the problem is really associated to the new world age mechanism, which now applies also to global constants, struct definitions, and to functions. I did not realize that when I wrote the initial post, but after a few tests I see that functions are affected too, something that is potentially a much bigger problem for me than just global variables. This is because I need to write a lot of functions depending on the user YAML configuration file (log-prior, log-posterior, and depending on the sampler used also several ancillary functions, for example to generate a random sample from the prior). All these functions will be affected by the world age, and hence I will be unable to use them outside the top-level in the REPL.

@generated functions are great, and I use them in several parts of my code. However, right now it is not obvious for me how to use them for two main reasons:

  1. They can generate code depending on the type of the arguments, not their value. A YAML file, when loaded, is just a big dictionary, and I would not have access to the keys or values inside the code-generation part of the function.
  2. As written above, I need to generate several functions starting from a single configuration file, and as far as I understand a generated function just make one single function.

I totally agree that if your approach works then there is no reason to rewrite it just for the sake of it and I didn’t want to come across as telling you to redo your work :sweat_smile:
My proposal where just an attempt to try to understand why you designed it this way and how a more idiomatic solution could look like (I still consider modules where you eval globals into not idiomatic).

Let’s play this through a bit more (not in the attempt to convince you to do anything just as a gedankenexperiment). I envision this:

  1. You parse the yaml into a type structure because, as you said, generated functions need to operate on information in the type domain. Fortunately, you can put arbitrary things there as long as they are isbits.
  2. Feed this ‘configuration type’ into as many generated functions as you need.

Here is a minimal example that I hope can get the point across:

struct Config{T} end

function parseConfig(str)
	Config{tuple(Symbol.(split(str))...)}()
end

@generated function compute(::Config{T}, data) where T
	return :($(T[1])(data[1], $(T[2])(data[2], $(T[3])(data[3], data[4]))))
end

then you can do:

julia> compute(parseConfig("+ * +"), [1,2,3,4])
15

julia> compute(parseConfig("+ ^ +"), [1,2,3,4])
129

Now, you probably need more complex parsing, much more complicated information in the type domain and likely want to use something more sophisticated for holding the data (e.g. a NamedTuple so values can have names or combining data and ‘configuration’). But I’d imagine that the core idea would still work for you :smiley:
Again not trying to convince you into throwing your existing project away :sweat_smile:

Function’s methods being involved in world age isn’t new, this following example is from v1.10 and works all the way back to v0.6 with slightly different error messages:

julia> function define_method()
           @eval bar() = 1im
           println(@eval (bar()))
           bar()
       end
define_method (generic function with 1 method)

julia> define_method()
0 + 1im
ERROR: MethodError: no method matching bar()
The applicable method may be too new: running in world age 31548, while current world is 31549.

This odd semantic was introduced because compiler optimizations can only consider the state of a function’s methods at that point in time, prior to execution changing that state. @eval and @invokelatest sacrifices those optimizations in order to consider the updated runtime state. If you don’t want to @invokelatest every call-site to dodge world-age, there are ways to dodge defining new methods or global names entirely.

Compile-time state had also applied to global constant variables because optimizations can bake in values and compute at compile-time. Prior to v1.12, reassigning global constants (where possible) would only warn you of silent errors because methods won’t adapt (invalidation of obsolete code). Now that they’re part of world age, methods do track them and adapt. There’s also no real distinction between reassigning global constants and struct/function names; struct and function names have always been implicitly const. However, struct definitions only reassign the name if the fields are different, while method definitions don’t under any circumstance.

Your original example actually doesn’t involve reassigning global constants, it changes global names of an external module by declaring new variables, whether const or not. This being involved in compiler state seems new, but I’m not sure because I haven’t really done intermodule global declaration/assignment before.

1 Like

Thank you again @Benny for your very helpful reply. I checked in the past RuntimeGeneratedFunctions.jl, but they unsuitable in my case as I need to write highly optimized code (typical execution times can be several days right now). I also checked DynamicExpressions.jl, but they seems to me more suitable for little snippets of codes, and I do not think are easily used in my case.

Just to better clarify my needs, the current code does something like

config = (; a=3.14, b=:(sqrt(10) - x))

function test1a(config)
    test = Module(:test)
    code = quote
        global a
        a = $(config.a)
        function b(x)
            $(config.b)
        end
    end
    Core.eval(test, code)
    test
end

function test1b(t)
    t.b(t.a)
end

Because of the world-age problem, this code is typically run in a two step process:

julia> t1 = test1a(config)
Main.test

julia> test1b(t1)
0.0222776601683794

My config is really a YAML file, possibly with associated external data, but this minimal example will do. The solution I am using now is not too nice (quite ugly in fact…), and I can only use it at the top-level.

Thinking twice at this situation, probably the only viable alternative is to use macros:

macro test2(a, b)
    return quote
        (; a=$(esc(a)), b=x->$b)
    end
end

so that

julia> t2 = @test2(config.a, sqrt(10) - x)
(a = 3.14, b = var"#119#120"())

julia> t2.b(t2.a)
0.0222776601683794

This was my original implementation in the past, and I switched to the module technique because macros were associated with quite a number of headaches for me… but perhaps I should go back and reconsider this solution.

Thank you again @abraemer . Your solution is interesting, but unfortunately I need to deal with data that will very often contain vectors and arrays: so, since they are not is bits, I cannot really use @generated functions.

One trick would be to save the configuration as values of a dictionary with isbits keys (or even Vals of sequential integers), but this solution is probably even uglier than the current one I have.

Anyway your mention to macros triggered some further thoughts (see my reply above). Perhaps I should give them another chance…

I would check again. There’s no runtime performance cost to using RuntimeGeneratedFunctions.jl. the costs are at compile time.

Rather, if you are worried about the code being slow, then it’s a exactly this business of evaling untyped globals into a module that you should be avoiding like the plague.

Thank you @Mason, I will check again. Just to clarify, I am not using @eval (or rather Core.eval) for critical parts of the code. In my example above, the critical part is test1b (in my case typically associated to the evaluation of the log-likelihood). I am using eval instead in the first part, test1a, but that is associated with the load, parsing, and compilation of the configuration file. So that’s OK to be not super-efficient there, since this task is typically run once in interactive mode.

I’m not aware of non-isbits types being a restriction for @generated functions. You might be referring to this bit in the Manual

  1. Generated functions must not mutate or observe any non-constant global state (including, for example, IO, locks, non-local dictionaries, or using hasmethod). This means they can only read global constants, and cannot have any side effects. In other words, they must be completely pure. Due to an implementation limitation, this also means that they currently cannot define a closure or generator.

but most of that just refers to the part of the @generated function that constructs the output expression, which only has access to the types of your input data so it can’t mutate the actual instances. Only nested functions (closures, comprehensions, @task/@spawn) can’t be defined in the code of the output expression, otherwise people do just about anything else a normal method does. Note that the README of RuntimeGeneratedFunctions.jl involves an expression that mutates an input vector:

    ex = :(function f(_du, _u, _p, _t)
        @inbounds _du[1] = _u[1]
        @inbounds _du[2] = _u[2]
        nothing
    end)
    f1 = @RuntimeGeneratedFunction(ex)
    du = rand(2)
    u = rand(2)
...
    f1(du, u, p, t)

Yes, the point is associated to the part associated to the generated code, not to the already generated function. I need to make optimization choices associated to particular values of the user input, and therefore I would need to associate these values to different types passed to the @generated function. This, the way the user input is done at the moment, is not possible (but I am checking viable solutions…).

Regarding generated functions, I am already using them to various degrees in the code (mostly to have type-stable outputs). For example, since the gravitational lensing effect different galaxies (at the same distance) on a light ray can be added up (superposition principle), I am computing this collective effect using a @generated function that takes a suitable argument (essentially, a tuple of vectors of galaxy models, each vector having only concrete galaxy models inside). This makes the code type-stable even in the (usual) even where there are many (> 5) different kind of models used at the same time (Julia compiler can nicely deal with unions only if they contain a small number of different types).