Limitations of generated functions versus macros or functions

Tentatively writing a generated function and relearning things, and ran into a couple questions:

  1. The most warned limitation of generated functions is most side effects in the method body (throwing errors is a notable exception). The docs currently words the rationale:

The number of times a generated function is generated might be only once, but it might also be more often, or appear to not happen at all. As a consequence, you should never write a generated function with side effects - when, and how often, the side effects occur is undefined. (This is true for macros too - and just like for macros, the use of eval in a generated function is a sign that you’re doing something the wrong way.)

I never noticed the mention of macros there. Macro calls are documented to execute when called at parse-time, with no mention of caching like generated functions, so are side effects in macro bodies really undefined behavior? I wouldn’t want macro calls to depend on global state or anything, but the documented parse-time println examples with no warnings seem to contradict that.

  1. I never fully understood why a generated function’s method is stuck in the world age during definition, e.g. callees only use prior defined methods. Issue #23223 suggests to me that the core reason is the method’s execution being part of compile-time, but normal functions can opt to execute callees at compile-time without being stuck in a world age. In fact, my primary consideration now is a semantic guarantee of compile-time computation, no side-effects of course. Is the generated expression stuck per MethodInstance somehow? Hypothetically, if there were a @compiletime hint in normal functions e.g. foo(x::T, ::Val{N}) where {T, N} = x + @compiletime(bar(T, N)) to do what I intend, would that somehow avoid the limitations of generated functions except for side effects?

maybe JuliaC is a better fit? in any case I don’t think @generated really provides this guarantee in the way that you want it because it’s allowed to run the @generated function whenever and as often as it wants (potentially many more than 1 times).

Just to clarify, I mean computing something from argument types and parameters in the generated function’s body, before simply interpolating it into an otherwise unchanging generated expression. Ideally occurs once and gets cached, so good catch on that not being guaranteed. How could JuliaC help? Haven’t really looked much into it, haven’t heard of any compile-time guarantees or hints.

There’s just so many ways a Julia program might get run. It might be interpreted, it might be JIT’ed, it might be precompiled into a package cache, it might be in a sysimage, it might be compiled dll/so/dylib or executable. When’s parse time? Heck, I don’t even know when compile time is (or if it exists at all).

So instead, better to use normal functions. If they need help coaxing some compile-time optimization in some situation, I’d look to @assume_effects.

in particular, @assume_effects :foldable and ensuring the arguments are known at compile time (e.g. const or are types) I would expect goes a long way to making things happen at compile time.

Never tried it, and @assume_effects :foldable looks pretty much like what I imagined @compiletime to do. Don’t know why :nortcall is there, but I’m not expecting to call Core.Compiler.return_type anyway. Probably goes without saying, but its annotation doesn’t introduce world age limitations like generated function bodies, right? If so, the practical half of (2) is answered right there.

correct, although if your function is not, in fact, foldable, you might run into some nasty bugs. so use carefully. the compiler is supposed to be pretty good already at figuring out when something can be evaluated at compile time :slight_smile:

I think the main thing you’d want to help it with is understanding that your loops will terminate, so it may be safer to do @assume_effects :terminates_globally on loopy functions and let the compiler infer the other effects for the rest of the code.

Definitely moves long calls into compile-time. I’m uncertain about this being consistent and without side effects because of the mutable global (a proxy for reading a file into an object that isn’t intended to be mutated), and @time bizarrely doesn’t capture the JIT compilation time.

julia> const dat = randn(500_000_000); # 4GB !!

julia> @time sum(sin, dat)
  6.021553 seconds
11412.735905058416

julia> Base.@assume_effects :foldable bar() = sum(sin, dat);

julia> baz() = bar()
baz (generic function with 1 method)

julia> @time baz() # takes >6s
  0.000000 seconds
11412.735905058416

julia> @time baz() # takes <1s
  0.000000 seconds
11412.735905058416

julia> @code_llvm baz()
; Function Signature: baz()
;  @ REPL[59]:1 within `baz`
; Function Attrs: uwtable
define double @julia_baz_4077() #0 {
top:
;  @ REPL[59] within `baz`
  ret double 0x40C64A5E32230F6E
}

The annotation of a code block doesn’t work as the docs suggest, which is inconvenient:

julia> foo() = Base.@assume_effects :foldable begin sum(sin, dat) end;

julia> @code_llvm foo()
; Function Signature: foo()
;  @ REPL[83]:1 within `foo`
; Function Attrs: uwtable
define double @julia_foo_4422() #0 {
top:
; ┌ @ reducedim.jl:980 within `sum`
; │┌ @ reducedim.jl:980 within `#sum#736`
; ││┌ @ reducedim.jl:984 within `_sum`
; │││┌ @ reducedim.jl:984 within `#_sum#738`
; ││││┌ @ reducedim.jl:326 within `mapreduce`
; │││││┌ @ reducedim.jl:326 within `#mapreduce#728`
; ││││││┌ @ reducedim.jl:334 within `_mapreduce_dim`
         %0 = call double @j__mapreduce_4425(ptr nonnull @"jl_global#4426.jit")
         ret double %0
; └└└└└└└
}

But an annotated closure is an in-place workaround:

julia> function foo()
         Base.@assume_effects :foldable bar() = sum(sin, dat)
         bar()
       end;

julia> @code_llvm foo()
; Function Signature: foo()
;  @ REPL[2]:1 within `foo`
; Function Attrs: uwtable
define double @julia_foo_1706() #0 {
top:
;  @ REPL[2] within `foo`
  ret double 0x40C64A5E32230F6E
}

In case that is not possible (generated functions can be so handy, as they are very versatile), it may be possible to move the costly, but deterministic computation outside the generated function into a (plain vanilla) function, and potentially nudge it a bit with @assume_effects if needed.

(An MWE of the actual problem would help.)

Or, alternatively, memoization can get you almost compile-time. It’s just compile time +1, but it’s absolutely and semantically guaranteed to be O(1).

don’t want to overfit to the example you gave but be careful here that dat is not empty, otherwise I think sum may in fact end up calling return_types

For most cases I’d agree. But it’s not always straightforward to structure the caches, and redefining methods requires manual cache clears. Memoization.jl addresses the latter with definition-wise metaprogramming and a @generated function. Trying to evade the limitations but running into others led me to this topic.

I’m also finding out pretty quickly that compile-time overhead can get very redundant across different callers with the same static parameters when there is no central cache for the callee’s results. I’m fortunate that the things I’m trying to move into compile-time don’t yet run long enough to drag out compile times, but now I’m wonder how :foldable it is to memoize a compile-time function call. :inaccessiblememonly is left out, but I’m not certain caching calls is :effect_free or :consistent.

Again, good catch. I actually want to avoid as much implicit inference of output and container types as possible, don’t want things falling apart because a file changes.