Recompiling functions with infrequently mutating constant

I’m writing an application with a whole bunch of vectors of length N that syntactically have elements having a consistent meaning (chemical components in a mixture). Most of the time, I don’t need the names, but on some occasions (such as chemical reactions) I actually do need the names. This means I need to build an indexer to make the code readable.

For example, let “components” be a vector containing mass flows and “idx” be an indexer

stack.components[idx[:CO2]] = MolecularWeight[idx[:CO2]] * rxnCO2

If I set “idx” to be a NamedTuple global constant, I can’t change it because changing it would change the type. If I set it to be a Dictionary, I could mutate it and the program would adjust. This is a low level operation that could affect performance so using Dictionaries might take a performance hit. Now the thing is, this component indexer should change VERY infrequently, but it might be too inconvenient to reload the whole application if I wanted to change the indexer.

Then I discovered generated functions. One interesting property of generated functions is that if you make them observe a global variable, they won’t update if the variable changes. However, if you REDEFINE the generated function after changing the global variable, it will update. Redefining this generated function ends up recompiling all functions that use it. This means I could write:

function rename_indexer(names::NTuple{N, Symbol}) where N
    global INDEX_LIST

    INDEX_LIST = names
    include(joinpath(@__DIR__, "Indexer.jl"))
end

The code that I would stick into “Indexer.jl” is

@generated function build_indexer()
    global INDEX_LIST
    ind = 1:length(INDEX_LIST)
    ex  = Expr(:call)
    
    push!(ex.args, Expr(:curly, :NamedTuple, :($INDEX_LIST)))
    push!(ex.args, Expr(:tuple, ind...))

    return ex
end

This code produces a NamedTuple which is explicitly defined in code, and should produce a constant propagation.

Thus by invoking “rename_indexer” I would rebuild that generated function which would cause recompiling only the functions that use this indexer function (which produces a new constant NamedTuple), allowing for performant run-time execution that can periodically update. Correct? Do any of you see any pitfalls of this approach?

From the rules for use of generated functions:

Generated functions must not mutate or observe any non-constant global state

https://docs.julialang.org/en/v1/manual/metaprogramming/#Generated-functions

Your code observes the non-constant global state INDEX_LIST, so it pretty clearly violates this rule, meaning there will probably be pitfalls in your approach. I wouldn’t do this.

Are you absolutely, completely, totally sure you actually need a non-constant global? Your problem would be completely solved with no tricks required and optimal performance if you just pass idx to the function. Your function clearly actually needs that information, so hiding it in a mutable global might make your argument list look shorter, but it doesn’t actually simplify anything (in fact, just the opposite: now your function is explicitly a function of some arguments and secretly a function of some global that can change at any time). Just pass idx to wherever it’s needed and all your problems go away.

If, for some reason, you can’t do this, there’s still no need for a @generated function, since a regular function will also recompile any code that depends on it.

julia> sort_of_constant() = 1
sort_of_constant (generic function with 1 method)

julia> g() = sort_of_constant() + 1
g (generic function with 1 method)

julia> @code_typed g()
CodeInfo(
1 ─     return 2
) => Int64

julia> sort_of_constant() = 2
sort_of_constant (generic function with 1 method)

julia> @code_typed g()
CodeInfo(
1 ─     return 3
) => Int64

Note how the compiler has successfully propagated the constant result of sort_of_constant() through g(), and that that behavior updates when the function is redefined.

6 Likes

The problem with passing “idx” as a function argument is that the functions that need the indexer are called by functions called by functions called by functions. I’ll need to pass this information 3 levels down at least. The other problem is that a lot of functions don’t need this information, it makes calling different functions in a loop very difficult. So passing this information all the way down is very cumbersome.

The problem with the “sort_of_constant()” approach is that if I write:

build_indexer() = NamedTuple{Tuple(INDEX_LIST)}(Tuple(1:length(INDEX_LIST)))

It still observes a global variable. This means that the output cannot be inferred as constant.

INDEX_LIST = (:one,:two,:three)
build_indexer()
    (one = 1, two = 2, three = 3)

INDEX_LIST = (:a,:b,:c)
build_indexer()
   (a = 1, b = 2, c = 3)

It tracks the global variable perfectly, but it will come at a performance cost. In my experience, the performance of this approach is worse than using a global dictionary constant because the build_indexer() output isn’t even type-stable.

As for observing the mutating constant, the thing is, the generated function’s observed variable doesn’t mutate during the LIFETIME of the function. If it did change, the main side-effect is that the function couldn’t track the mutations and you’d get the wrong result. However, when the global variable is changed the generated function that is supposed to observe the change dies and is reborn as a new function, which re-triggers compilation because Julia seems to re-compile functions that call redefined functions. I only wish it would do that for re-definitions of a special type of infrequently-changing-constant.

I mean, sure. But you actually need that information, so you have to pass it somehow. Defining it in a global is just a way of hiding that fact from future readers of your code.

I’m suggesting that you have a function indexer() which returns the indices, and that you redefine (manually or via eval) that function when the indices change. In that way, the actual indexer() function never reads any global state, it just returns a constant.

But again, I wouldn’t do this–I would just pass idx around to the places that need it.

2 Likes

What about

julia> indexer() = (; a=1, b=2);
julia> fun()=indexer().a;
julia> fun()
1

julia> @code_native fun()
	.text
; ┌ @ REPL[103]:1 within `fun'
	movl	$1, %eax
	retq
	nopw	%cs:(%rax,%rax)

I don’t know whether this will work for very large named tuples, though.

Now you can simply redefine indexer and everything will recompile nicely.

edit: What rdeits said, exactly.

3 Likes

Ooooh! “eval” a programmatically generated expresion! Of course! Why didn’t I think of that?

function rename_indexer(names::NTuple{N, Symbol}) where N
    ex = :(build_indexer() = NamedTuple{Tuple($names)}(Tuple(1:$N)))
    eval(ex)
    return nothing
end

julia> rename_indexer((:one,:two,:three))
build_indexer (generic function with 1 method)

julia> build_indexer()
(one = 1, two = 2, three = 3)

julia> rename_indexer((:a,:b,:c))
build_indexer (generic function with 1 method)

julia> build_indexer()
(a = 1, b = 2, c = 3)

This solution is WAAAY simpler and only a LITTLE bit hacky. I don’t even need the global variable anymore! Anyway, there’s may be a way for me to get around the indexer altogether, but this is a good thought exercise for me. At least I think I have a workable fallback if it turns out I can’t get around it.

Before you go off and refactor lots of code: Please test that 1. const-prop works for large namedTuples in your application, and 2. Compile-time doesn’t explode.

Named tuple literals with 1k entries are really not what the compiler is used to dealing with, so this might end up being too clever.

If this blows up in your face, then try idx_o2() = 4 etc, and write a more complex macro that redefines all of the functions. Having a single zero-arg function for every symbol-to-index mapping is kinda ugly, but very light on the compiler.

Realistically, you would have a module IndexTranslation, and then zero-arg methods IndexTranslation.o2() or something, and all the uglyness would be hidden inside that module. Also note that this is still somewhat unclean because global variables.

1 Like

This named tuple is very unlikely to have more than 20 entries. If it was thousands, I’d bite the bullet and use a dict inside a global constant.