Why not use @generated?

It seems that using @generated is discouraged by the core devs and I would like to understand better when and why it is discouraged. I am aware of the excellent Keynote by @stevengj which already talks a lot about when (not) to use metaprogramming.

  • What are reasons against @generated?
  • I heard the theory, that calling an @generated function can actually hurt inference. Is that true?
  • There is if @generated ... How does the compiler decide if it uses the @generated definition? How does using if @generated interact with inference?
3 Likes

A good non technical reason is extensibility. Anything that happens inside generated can’t have new methods added afterwards. Recursive methods are open to extension everywhere with multiple dispatch.

I use a lot of (perhaps too many) generated functions. This biggest problems I face are that it makes debugging and fixing problems more difficult. Stack traces tend to be a lot worse, especially if you do a lot of pushing into expressions:

q = quote end
push!(q.args, :(c = a + b))

I’m considering eventually going through my code and pushing a bunch of LineNumberNode(@__LINE__, @__FILE__) into my expressions, but I probably ought to learn something about how these are used first, so I don’t end up doing something silly.

Also, if your generated functions are calling other functions to help build the expression you’re using, changing those functions wont automatically update the generated function:

julia> foo(::Type{T}) where {T} = :(a + b + one($T))
foo (generic function with 1 method)

julia> @generated bar(a::T,b::T) where {T} = foo(T)
bar (generic function with 1 method)

julia> bar(2,3)
6

julia> foo(::Type{T}) where {T} = :(a + b - one($T))
foo (generic function with 1 method)

julia> bar(2,3)
6

julia> @generated bar(a::T,b::T) where {T} = foo(T)
bar (generic function with 1 method)

julia> bar(2,3)
4

To work around this, a lot of my functions with two or more type signatures end up looking like:

@generated function foobar(
    a::T, b::S
# ) where {T,S}
) where {S,T}
    ...
end

So that I can switch which line is commented out whenever I want Revise to update the function.

1 Like

Generated functions are an escape hatch that allows you to bypass the regular multiple dispatch mechanism. As such, you should only use them if regular multiple dispatch can’t get you what we need. Since multiple dispatch is a very powerful abstraction, you should think carefully about whether your problem is of sufficient complexity that it is beyond the reach of regular multiple dispatch. If you use generated functions, the compiler will have less information about what the function is going to do, which may result in:

  • Slower compile times due to excessive specialization (and invocation of the generated function).
  • Various sorts of world age issues
  • Reduced performance
  • Reduced debuggability
  • Crashes if you return something invalid from a generated function (the compiler isn’t particularly hardened against invalid IR, because usually it is generated from the frontend and thus correct by construction).

Or in other words, you topple over a domino with a nuke, but maybe blowing on it is enough (lest you accidentally disintegrate it or cause a nuclear winter). Generated functions are there if you need them, but even those who well know their power try to avoid them if possible.

14 Likes

In what sense, when? Could you give an example?

Is this something hard to see in microbenchmarks of just the generated function, and only apparent when it gets called from elsewhere?

Because generated functions can be inlined, I would have thought LLVM has all the information?
Or does this have something to do with Julia’s front end, eg constant propagation?

Does this apply at all to macros?

One of the most common reasons for me to write generated functions is to get increased performance. A simple example:

using VectorizationBase, SIMDPirates
function regularized_cov_block_quote(W::Int, T, reps_per_block::Int, stride, mask_last::Bool = false, mask = :r)# = 0xff)
    # loads from ptr_sample
    # stores in ptr_s² and ptr_invs
    # needs vNinv, mulreg, and addreg to be defined
    reps_per_block -= 1
    size_T = sizeof(T)
    WT = size_T*W
    V = Vec{W,T}
    quote
        $([Expr(:(=), Symbol(:μ_,i), :(vload($V, ptr_smpl + $(WT*i), $([mask for _ ∈ 1:((i==reps_per_block) & mask_last)]...)))) for i ∈ 0:reps_per_block]...)
        $([Expr(:(=), Symbol(:Σδ_,i), :(vbroadcast($V,zero($T)))) for i ∈ 0:reps_per_block]...)
        $([Expr(:(=), Symbol(:Σδ²_,i), :(vbroadcast($V,zero($T)))) for i ∈ 0:reps_per_block]...)
        for n ∈ 1:N-1
            $([Expr(:(=), Symbol(:δ_,i), :(vsub(vload($V, ptr_smpl + $(WT*i) + n*$stride*$size_T),$(Symbol(:μ_,i))))) for i ∈ 0:reps_per_block]...)
            $([Expr(:(=), Symbol(:Σδ_,i), :(vadd($(Symbol(:δ_,i)),$(Symbol(:Σδ_,i))))) for i ∈ 0:reps_per_block]...)
            $([Expr(:(=), Symbol(:Σδ²_,i), :(vmuladd($(Symbol(:δ_,i)),$(Symbol(:δ_,i)),$(Symbol(:Σδ²_,i))))) for i ∈ 0:reps_per_block]...)
        end
        $([Expr(:(=), Symbol(:xbar_,i), :(vmuladd(vNinv, $(Symbol(:Σδ_,i)), $(Symbol(:μ_,i))))) for i ∈ 0:reps_per_block]...)
        $([Expr(:(=), Symbol(:ΣδΣδ_,i), :(vmul($(Symbol(:Σδ_,i)),$(Symbol(:Σδ_,i))))) for i ∈ 0:reps_per_block]...)
        $([Expr(:(=), Symbol(:s²_,i), :(vmul(vNm1inv,vfnmadd($(Symbol(:ΣδΣδ_,i)),vNinv,$(Symbol(:Σδ²_,i)))))) for i ∈ 0:reps_per_block]...)
        $([:(vstore!(ptr_mean, $(Symbol(:xbar_,i)), $([mask for _ ∈ 1:((i==reps_per_block) & mask_last)]...)); ptr_mean += $WT) for i ∈ 0:reps_per_block]...)
        $([:(vstore!(ptr_vars, $(Symbol(:s²_,i)), $([mask for _ ∈ 1:((i==reps_per_block) & mask_last)]...)); ptr_vars += $WT) for i ∈ 0:reps_per_block]...)
        ptr_smpl += $(WT*(reps_per_block+1))
    end
end
@generated function mean_and_var!(
    means::AbstractVector{T}, vars::AbstractVector{T}, sample::AbstractArray{T}
) where {T}
    W, Wshift = VectorizationBase.pick_vector_width_shift(T)
    V = Vec{W,T}
    quote 
        D, N = size(sample); sample_stride = stride(sample, 2)
        @boundscheck if length(means) < D || length(vars) < D
            throw(BoundsError("Size of sample: ($D,$N); length of preallocated mean vector: $(length(means)); length of preallocated var vector: $(length(vars))"))
        end
        ptr_mean = pointer(means); ptr_vars = pointer(vars); ptr_smpl = pointer(sample)
        vNinv = vbroadcast($V, 1/N); vNm1inv = vbroadcast($V, 1/(N-1))
        for _ in 1:(D >>> $(Wshift + 2)) # blocks of 4 vectors
            $(regularized_cov_block_quote(W, T, 4, :sample_stride))
        end
        for _ in 1:((D & $((W << 2)-1)) >>> $Wshift) # single vectors
            $(regularized_cov_block_quote(W, T, 1, :sample_stride))
        end
        r = D & $(W-1)
        if r > 0 # remainder
            mask = VectorizationBase.mask(T, r)
            $(regularized_cov_block_quote(W, T, 1, :sample_stride, true, :mask))
        end
        nothing
    end
end

Benchmarking:

julia> using BenchmarkTools, Statistics

julia> @benchmark mean_and_var!($x,$y,$A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     39.568 μs (0.00% GC)
  median time:      40.256 μs (0.00% GC)
  mean time:        43.080 μs (0.00% GC)
  maximum time:     227.730 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> x'
1×200 LinearAlgebra.Adjoint{Float64,Array{Float64,1}}:
 -0.0466792  -0.0455147  0.0170653  0.0011996  0.0307585  0.0308389  0.0303355  -0.0144757  -0.0509977  -0.0120208  -0.0161698  0.0498736  0.0142003  0.0513357  0.00356376  0.0202032  -0.0300317  -0.0591260  …  0.0351895  -0.014007  0.0231309  -0.00640476  0.0121385  0.00250655  0.00367508  -0.0373912  -0.00839410  -0.00719569  -0.0306729  0.0163719  -0.038363  0.0357159  0.0111598  0.00553716  -0.018665  0.0148885

julia> mean(A, dims = 2)'
1×200 LinearAlgebra.Adjoint{Float64,Array{Float64,2}}:
 -0.0466792  -0.0455147  0.0170653  0.0011996  0.0307585  0.0308389  0.0303355  -0.0144757  -0.0509977  -0.0120208  -0.0161698  0.0498736  0.0142003  0.0513357  0.00356376  0.0202032  -0.0300317  -0.0591260  …  0.0351895  -0.014007  0.0231309  -0.00640476  0.0121385  0.00250655  0.00367508  -0.0373912  -0.00839410  -0.00719569  -0.0306729  0.0163719  -0.038363  0.0357159  0.0111598  0.00553716  -0.018665  0.0148885

julia> y'
1×200 LinearAlgebra.Adjoint{Float64,Array{Float64,1}}:
 1.02852  1.02274  1.00838  1.06236  1.04392  0.951408  1.02583  0.995716  1.03187  1.046  1.02397  1.02082  0.991599  0.937852  0.985895  1.03206  0.979809  1.00042  1.0083  1.00608  1.02262  1.00769  0.951676  1.01429  …  0.981213  0.993444  1.08527  0.976448  1.01732  0.942424  1.05196  1.0542  0.972378  0.991214  0.965925  0.981092  0.938367  0.996919  1.07532  0.939985  1.00628  0.994173  0.976612  0.970468  1.02659

julia> var(A, dims = 2)'
1×200 LinearAlgebra.Adjoint{Float64,Array{Float64,2}}:
 1.02852  1.02274  1.00838  1.06236  1.04392  0.951408  1.02583  0.995716  1.03187  1.046  1.02397  1.02082  0.991599  0.937852  0.985895  1.03206  0.979809  1.00042  1.0083  1.00608  1.02262  1.00769  0.951676  1.01429  …  0.981213  0.993444  1.08527  0.976448  1.01732  0.942424  1.05196  1.0542  0.972378  0.991214  0.965925  0.981092  0.938367  0.996919  1.07532  0.939985  1.00628  0.994173  0.976612  0.970468  1.02659

julia> @benchmark mean!($y,$A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     50.627 μs (0.00% GC)
  median time:      51.388 μs (0.00% GC)
  mean time:        51.841 μs (0.00% GC)
  maximum time:     136.623 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark var($A, dims = 2, mean = $y)
BenchmarkTools.Trial: 
  memory estimate:  3.91 KiB
  allocs estimate:  14
  --------------
  minimum time:     107.394 μs (0.00% GC)
  median time:      110.219 μs (0.00% GC)
  mean time:        111.107 μs (0.00% GC)
  maximum time:     242.747 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

It is about 25% faster at getting both the mean and variance as Statistics.mean! is at getting just the mean. [EDIT for good measure:

julia> @benchmark sum!($x,$A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     53.718 μs (0.00% GC)
  median time:      54.253 μs (0.00% GC)
  mean time:        54.750 μs (0.00% GC)
  maximum time:     139.711 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

]
Looking at this particular example, I think it’d actually be fairly easy to make it not @generated, so I really didn’t try hard enough :sweat_smile: .

But on the other hand, is it worth it to spend my time de-@generated functions like these?

Perhaps I can expect better compile times (and I have been suffering from bad compile times, so that is a definite win), but in which situations can I expect better run time performance, too?

2 Likes

Is it the only reason that @generated exists? How about contextual dispatch as used in e.g. auto-diff systems?

(But I agree that many uses of generated functions can be eliminated by cleverly using multiple dispatch and constant folding.)

Here is a (slightly contrived) example where @generated seems to hurt inference:

foldldiag_gen(op, acc, A::AbstractArray) =
    ndims(A) === 0 ? acc : _foldldiag_gen(op, acc, A, 1, size(A, 1))

@generated function _foldldiag_gen(op, acc::T, A::AbstractArray, i0, i1) where T
    idx = [:i for i in 1:ndims(A)]
    quote
        for i in i0:i1
            acc′ = op(acc, A[$(idx...)])
            acc′ isa T || return _foldldiag_gen(op, acc′, A, i + 1, i1)
            acc = acc′
        end
        return acc
    end
end

foldldiag_fun(op, acc, A::AbstractArray) =
    ndims(A) === 0 ? acc : _foldldiag_fun(op, acc, A, 1, size(A, 1))

function _foldldiag_fun(op, acc::T, A, i0, i1) where T
    for i in i0:i1
        idx = ntuple(_ -> i, ndims(A))
        acc′ = op(acc, A[idx...])
        acc′ isa T || return _foldldiag_fun(op, acc′, A, i + 1, i1)
        acc = acc′
    end
    return acc
end

Base.return_types(
    foldldiag_gen,
    Tuple{
        typeof(+),
        Bool,
        Matrix{T} where T <: Union{Int, Float64},
    },
)
# 1-element Array{Any,1}:
#  Any

Base.return_types(
    foldldiag_fun,
    Tuple{
        typeof(+),
        Bool,
        Matrix{T} where T <: Union{Int, Float64},
    },
)
# 1-element Array{Any,1}:
#  Union{Bool, Float64, Int64}
2 Likes

Generated functions, when initially added to the language were only allowed to return AST.
Which is fine.
But can’t implement the kinds of code transforms needed for AD,
as
A) Julia doesn’t provide a way to get the AST.
B) AST is the wrong level for AD, AD loves single assignment, which is lowered form.

Generated functions were exended to also be allowed to return CodeInfo (i.e. lowered IR)
much later.
That, combined with Base.uncompressed_ast (which despite the name does not retun the AST, but actually the lowered IR) is enough to implement that.

Its kind of a hack though.
and as I recall Jaret talking about it, it was kind of intended to just abuse @generated and Base.uncompress_ast while a more sensible way to do this kind of thing was devised.
So as to allow the idea to be prototyped.

1 Like

@generated is pretty obsure, and you can work with julia for years without ever needing it.

Many people do not understand how it works.
Like it is petty odd to use as well, even compared to macros.
The arguments are types in the generator body, and when interpolated into the returned quote.
But if directly used by name they are inscope.
(in macros they would be out of scope).
this makes sense of couse, and they couldn’t be any other way.
but you definately have to learn to use them.

Anyway, since they are rarely needed and are obsure,
you’re making your code much harded for other people to modify or contribute to,
so it had better be worth it.

I have no data on that… but generated functions are well-documented, with plenty of examples. Anyone motivated to learn about them can do so.

Generated functions are part of Julia. Because of the reasons mentioned above, they should only be used when needed, but sometimes they are needed.

Just to explain the context of this thread, we are discussing if it makes sense to avoid using @generated to implement ConstructionBase.setproperties (see the documentation for what it does). See:

https://github.com/JuliaObjects/ConstructionBase.jl/issues/21

If people here have some specific comments to the way ConstructionBase.setproperties is implemented, please feel free to join the discussion there.

But I still am interested in general when @generated hurts compilation performance or quality. My example above is rather a stupid use of @generated. Is there a minimal but practically relevant example where using @generated is bad?

2 Likes

I knew the history that CodeInfo was added to help Cassette.jl but didn’t know that it was treated a hack until a new mechanism is implemented? But isn’t @generated at the right stage for doing this (i.e., input types are figured out)? Is there a discussion on alternative mechanisms?

Here is the original issue, it kind of references something that kinda sounds like Cassette,
though it isn’t Contextual Dispatch (see below) it is a Custom Compiler Pass.

and the PR that solved it,
which mentions dispatch interceptable function wrappers (which is Contextual Dispatch).

My own reading of those, and how empty they are is that there are a lot of discussions that happened offline. So idk if there is more. anywhere.

I think it is worth being clear on what is Contentual Dispatch vs Custom Compiler Pass.
Context Dispatch is the thing Cassette cassette overdub does (at least when you don’t specify a pass).
I prefer to think of that as call overloading, it overloads what it means to do a call at all.
Julia already provides the ability to overload what a particular call does, via (::T)(args...) = ,
which overloads what it means to call some function of type T.

If if julia didn’t lower functions to directly be called (and that to hit those (::T)(args...) = )
overloads,
but instead lowered them always to call(current_context(), f::T, args...)
with a default implementation of call(::NormCtx, f::T, args...) = f(args...)
then one would not need to use generated functions + Base.uncompressed_ast to make that transform it would already be made.
I think I head that that was being considered.

Anyway so the other thing is the Custom Compiler Pass.
Where you transform the IR.
So the contextual (call overloading) can be accomplished using a Custom Compiler Pass, by just replacing all the calls in the IR as discussed.

But isn’t @generated at the right stage for doing this (i.e., input types are figured out)?

Not nessecarily. Really depends what you want to do.

Note that cassette operated on lowered IR, not typed+specialized IR,
because the generator runs before the optimizer.
Generators run very early in the typed-IR creation stage, they have types, but nothing else.

Contextual dispatch basically doesn’t matter when you run it, since if you are just doing a replace of all calls t.o call something else, then that can be done on the AST (see Arborist.jl), lowered IR, typed IR, optimized Typed IR, LLVM, etc.
If you do it before the optimizer runs then you can write overdubs that depend on types and have the ones you don’t change get optimized out.
But if you do it certain later stages, then you have access to the types and can not repace calls you don’t want to replace.

Other things you might want to run, like hoising things out of loops,
is much easier done either ealier: On the AST when loops are still clear,
or later:
once the control-flow-graph (CFG) and domination tree (DomTree (unrelated to the HTML use)) is constructed and available.
Since making those is really expensive (MagnetricReadHead now constructs those during a cassette pass, it is not fast and the optimizer redoes all that work again.)
Without the CFG and DomTree making custom optimization passes is very difficult, because you don’t have easy way of knowing what runs in what order (you can work it out, but doing so is equiv to constructing the CFG)

I think the time it runs might actually be the worst time it could run for most custom compiler-pass purposes.
But it is a whole lot better than not having it at all.

7 Likes

Thanks a lot! That’s super informative. I haven’t thought about hooking things into later stages of compilation but that sounds very interesting.

@jw3126 and @tkf have a much better reason to use @generated than my example. It came from the fact that I was initially writing code in that way to support fixed size arrays, where I’m likely to change the blocking behavior as a function of array size (eg, if we have avx512 and are summing 72 elements, we’d want to do it in 3 blocks of 24, not 2 blocks of 32 followed by 1 block of 8).
We would change both the number and the size of the blocks as a function of the number of rows we are summing.

But if they’re dynamically sized, we probably don’t want to do anything like that, so it makes a lot more sense to write it as a simple function:

using VectorizationBase, SIMDPirates
function mean_and_var_nogen!(
    means::AbstractVector{T}, vars::AbstractVector{T}, sample::AbstractArray{T}
) where {T}
    V = VectorizationBase.pick_vector(T)
    W, Wshift = VectorizationBase.pick_vector_width_shift(T)
    WT = VectorizationBase.REGISTER_SIZE
    D, N = size(sample); sample_stride = stride(sample, 2) * sizeof(T)
    @boundscheck if length(means) < D || length(vars) < D
        throw(BoundsError("Size of sample: ($D,$N); length of preallocated mean vector: $(length(means)); length of preallocated var vector: $(length(vars))"))
    end
    ptr_mean = pointer(means); ptr_vars = pointer(vars); ptr_smpl = pointer(sample)
    vNinv = vbroadcast(V, 1/N); vNm1inv = vbroadcast(V, 1/(N-1))
    for _ in 1:(D >>> (Wshift + 2)) # blocks of 4 vectors
        Base.Cartesian.@nexprs 4 i -> μ_i = vload(V, ptr_smpl + WT * (i-1))
        Base.Cartesian.@nexprs 4 i -> Σδ_i = vbroadcast(V, zero(T))
        Base.Cartesian.@nexprs 4 i -> Σδ²_i = vbroadcast(V, zero(T))
        for n ∈ 1:N-1
            Base.Cartesian.@nexprs 4 i -> δ_i = vsub(vload(V, ptr_smpl + WT * (i-1) + n*sample_stride), μ_i)
            Base.Cartesian.@nexprs 4 i -> Σδ_i = vadd(δ_i, Σδ_i)
            Base.Cartesian.@nexprs 4 i -> Σδ²_i = vmuladd(δ_i, δ_i, Σδ²_i)
        end
        Base.Cartesian.@nexprs 4 i -> xbar_i = vmuladd(vNinv, Σδ_i, μ_i)
        Base.Cartesian.@nexprs 4 i -> ΣδΣδ_i = vmul(Σδ_i, Σδ_i)
        Base.Cartesian.@nexprs 4 i -> s²_i = vmul(vNm1inv, vfnmadd(ΣδΣδ_i, vNinv, Σδ²_i))
        Base.Cartesian.@nexprs 4 i -> (vstore!(ptr_mean, xbar_i); ptr_mean += WT)
        Base.Cartesian.@nexprs 4 i -> (vstore!(ptr_vars, s²_i); ptr_vars += WT)
        ptr_smpl += 4WT
    end
    for _ in 1:((D & ((W << 2)-1)) >>> Wshift) # single vectors
        μ_i = vload(V, ptr_smpl)
        Σδ_i = vbroadcast(V, zero(T))
        Σδ²_i = vbroadcast(V, zero(T))
        for n ∈ 1:N-1
            δ_i = vsub(vload(V, ptr_smpl + n*sample_stride), μ_i)
            Σδ_i = vadd(δ_i, Σδ_i)
            Σδ²_i = vmuladd(δ_i, δ_i, Σδ²_i)
        end
        xbar_i = vmuladd(vNinv, Σδ_i, μ_i)
        ΣδΣδ_i = vmul(Σδ_i, Σδ_i)
        s²_i = vmul(vNm1inv, vfnmadd(ΣδΣδ_i, vNinv, Σδ²_i))
        vstore!(ptr_mean, xbar_i); ptr_mean += WT
        vstore!(ptr_vars, s²_i); ptr_vars += WT
        ptr_smpl += WT
    end
    r = D & (W-1)
    if r > 0 # remainder
        mask = VectorizationBase.mask(T, r)
        μ_i = vload(V, ptr_smpl, mask)
        Σδ_i = vbroadcast(V, zero(T))
        Σδ²_i = vbroadcast(V, zero(T))
        for n ∈ 1:N-1
            δ_i = vsub(vload(V, ptr_smpl + n*sample_stride, mask), μ_i)
            Σδ_i = vadd(δ_i, Σδ_i)
            Σδ²_i = vmuladd(δ_i, δ_i, Σδ²_i)
        end
        xbar_i = vmuladd(vNinv, Σδ_i, μ_i)
        ΣδΣδ_i = vmul(Σδ_i, Σδ_i)
        s²_i = vmul(vNm1inv, vfnmadd(ΣδΣδ_i, vNinv, Σδ²_i))
        vstore!(ptr_mean, xbar_i, mask)
        vstore!(ptr_vars, s²_i, mask)
    end
    nothing
end

This seems to generate more or less the same assembly.

However, check this out!
I put each function in a file, followed by:

A = randn(200,1000);
x = Vector{Float64}(undef, 200); y = similar(x)

@time mean_and_var!(x, y, A)

Then:

julia> function test_compilation(N)
           path = "/home/chriselrod/Documents/progwork/julia/"
           nogen_file = joinpath(path, "nogen_mean_var.jl")
           gen_file = joinpath(path, "gen_mean_var.jl")
           julia = joinpath(Sys.BINDIR, "julia")
           for _ in 1:N
               println("Testing no gen:")
               run(`$julia -O3 $nogen_file`)
               println("Testing generated:")
               run(`$julia -O3 $gen_file`)
           end
       end
test_compilation (generic function with 1 method)

julia> test_compilation(10)
Testing no gen:
  0.385905 seconds (1.17 M allocations: 57.096 MiB, 2.55% gc time)
Testing generated:
  1.063699 seconds (2.94 M allocations: 144.895 MiB, 8.16% gc time)
Testing no gen:
  0.372277 seconds (1.17 M allocations: 57.096 MiB, 2.64% gc time)
Testing generated:
  1.051894 seconds (2.94 M allocations: 144.895 MiB, 8.11% gc time)
Testing no gen:
  0.367097 seconds (1.17 M allocations: 57.096 MiB, 2.64% gc time)
Testing generated:
  1.047010 seconds (2.94 M allocations: 144.895 MiB, 8.17% gc time)
Testing no gen:
  0.370914 seconds (1.17 M allocations: 57.096 MiB, 2.66% gc time)
Testing generated:
  1.105556 seconds (2.94 M allocations: 144.895 MiB, 11.13% gc time)
Testing no gen:
  0.378023 seconds (1.17 M allocations: 57.096 MiB, 2.57% gc time)
Testing generated:
  1.048928 seconds (2.94 M allocations: 144.895 MiB, 8.22% gc time)
Testing no gen:
  0.367192 seconds (1.17 M allocations: 57.096 MiB, 2.65% gc time)
Testing generated:
  1.060198 seconds (2.94 M allocations: 144.895 MiB, 8.21% gc time)
Testing no gen:
  0.367748 seconds (1.17 M allocations: 57.096 MiB, 2.60% gc time)
Testing generated:
  1.087009 seconds (2.94 M allocations: 144.895 MiB, 10.84% gc time)
Testing no gen:
  0.371097 seconds (1.17 M allocations: 57.096 MiB, 2.70% gc time)
Testing generated:
  1.066088 seconds (2.94 M allocations: 144.895 MiB, 8.23% gc time)
Testing no gen:
  0.368057 seconds (1.17 M allocations: 57.096 MiB, 2.64% gc time)
Testing generated:
  1.092583 seconds (2.94 M allocations: 144.895 MiB, 11.02% gc time)
Testing no gen:
  0.368158 seconds (1.17 M allocations: 57.096 MiB, 2.62% gc time)
Testing generated:
  1.063671 seconds (2.94 M allocations: 144.895 MiB, 8.19% gc time)

That’s roughly 1.07 seconds for the generated function to compile, vs 0.37 for the non-generated function – that is approaching a 3x difference!
Needless to say, but I’ve already updated the file I took that example from.

This was an example that falls under the “obviously don’t use @generated”, but I am still curious about the better examples.
Do recursive functions stress the compiler?
What about having to rely on constant propagation? Isn’t it unwise to rely on compiler implementation details?

5 Likes

Thanks a lot for looking into this! Good to have a practical example of heavily optimized code where this is important.

Yeah, I’d like to know if there is a plan to solve this as well. I know that there is an ongoing work on removing purity constraint of the generator:

But this is rather “use @generated more” direction, kind of opposite to the main argument here.

I find the problem is knowing the point where propagation fails in complex recursive methods. It can be difficult to know why a particular recursive function compiles away, or not, and it seems that that point may subtly change between minor versions of Julia.

2 Likes