Understanding generated functions

Hi all. I’m trying to understand how generated functions work. From the 0.6 manual:

A very special macro is @generated, which allows you to define so-called generated functions. These have the capability to generate specialized code depending on the types of their arguments with more flexibility and/or less code than what can be achieved with multiple dispatch. While macros work with expressions at parsing-time and cannot access the types of their inputs, a generated function gets expanded at a time when the types of the arguments are known, but the function is not yet compiled.

So my simplistic mind has the following paths, in the order of preference:

  1. Use regular functions (and multiple dispatch) for most cases

  2. Create macros to reduce boilerplate code that otherwise tricky/ugly to implement with regular functions

Where does generated functions fit? The manual says generated function has the additional benefit of knowing the type of the arguments, so it seems to imply that it’s addressing a gap that macros cannot help. Hence, I would add this as a last resort in my toolkit:

  1. Use generated functions when even macros cannot help with my situation.

I have a hard time coming up with a good use case, however. Does anyone have any concrete example where it’s tough/ugly to use regular functions and macros and it would be much nicer (or more performant) to code as generated functions?



It mostly comes in handy when you have a very large (or infinite) number of parametric types for which you want to have really performant code. For instance, StaticArrays.StaticVector{N, T} is a vector of length N. To sum all the N elements, you could write

function sum(vec::StaticVector)
   res = 0
   for i in 1:length(vec)
      res += vec[i]

but looping actually has a (very small) cost. This code would be more efficient:

sum(vec::StaticVector{1}) = vec[1]
sum(vec::StaticVector{2}) = vec[1] + vec[2]
sum(vec::StaticVector{3}) = vec[1] + vec[2] + vec[3]

It’s painful to write that code manually. A macro would help, but since N is potentially infinite, you can’t cover all cases. You should use a generated function, so that Julia generates the above code automatically, but only for those types that are actually used in your program.

In my research, I use formal logic, with types like Add{Mul{Int, Variable}, Int}. Generated functions allow me to do a big part of unification (i.e. symbolic manipulations) at compile-time instead of runtime.


Note that for cases like this LLVM will happily unroll the loop for you since the loop length is statically known. With (almost) exactly your function:

julia> @code_llvm sum(rand(SVector{3}))

define double @julia_sum_62699(%SArray* nocapture readonly dereferenceable(24)) #0 !dbg !5 {
  %1 = getelementptr %SArray, %SArray* %0, i64 0, i32 0, i64 0
  %2 = load double, double* %1, align 8
  %3 = fadd double %2, 0.000000e+00
  %4 = getelementptr %SArray, %SArray* %0, i64 0, i32 0, i64 1
  %5 = load double, double* %4, align 8
  %6 = fadd double %3, %5
  %7 = getelementptr %SArray, %SArray* %0, i64 0, i32 0, i64 2
  %8 = load double, double* %7, align 8
  %9 = fadd double %6, %8
  ret double %9

That’s great! The number of use-cases for generated functions are indeed diminishing as Julia’s plain-old functions improve and we learn to compose things better. I think a great example here is multidimensional iteration… particularly if you’re interested (or will indulge me) in a walk through a bit of Julia’s evolution.

One limitation of Julia’s plain-old-functions (especially in pre-historic times) is that used to be difficult to express multidimensional iteration over an arbitrary N-dimensional array without leaning on linear indexing (which might be slow). There’s no builtin syntax for expressing an arbitrary number of for loops. You can manually say:

sum(v::Vector{Int}) = (r=0; for i in 1:size(v, 1); r += v[i]; end; r)
sum(A::Matrix{Int}) = (r=0; for j in 1:size(A, 2); for i in 1:size(A, 1); r += A[i,j]; …
sum(A::Array{Int,3}) = (r=0; for k in 1:size(A, 3); for j in 1:size(A, 2); for i in 1:size(A, 1); r += A[i,j,k]; …
sum(A::Array{Int,4}) = (r=0; for l in 1:size(A, 4); for k in 1:size(A, 3); for j in 1:size(A, 2); for i in 1:size(A, 1); r += A[i,j,k,l]; …

But that’s awful and tedius and terrible to maintain. Julia has a really good set of metaprogramming tools, though, and so we could instead do something like:

for N in 1:4
    @eval sum(A::Array{Int,$N}) = (r=0; $(construct_n_for_loops(N, …)))

That’s still really annoying to read and write, and it’s still hard to write that construct_n_for_loops AST generation method — which was commonly needed. So a very clever person came along and expressed that all as a clever set of macros instead (yes, I’m intentionally linking to a really old doc):

@ngenerate N Int function sum{N}(A::Array{Int,N})
    r = 0
    @nloops N i A begin
        r += @nref N A i

That old 0.3 syntax said to generate those four methods for A::Array{Int, N} with N for loops and indexed into A with N indices. The @nloops macro creates those loops, using variables i_1, i_2, etc, as indices structured to access all the dimensions of array A. It still exists — check out what it does for:

using Base.Cartesian
@macroexpand @nloops 4 i A begin
    r += @nref 4 A i

Later on, Julia introduced generated functions. This meant that we no longer needed to only generate a subset of these methods — Julia could automatically generate whatever it needed on demand, based purely upon the types of the arguments. We reimplemented all these multidimensional structures as generated functions instead and deprecated @ngenerate (this is now 0.6 syntax):

@generated function sum(A::Array{Int})
    N = ndims(A)
        r = 0
        @nloops $N i A begin
            r += @nref $N A i

You’re probably now looking at me like a crazy person, since, why didn’t we just use a normal function with just one loop for i in eachindex(A)? Precisely. It took some time to realize that generated functions could be used to implement those fast cartesian iterators. It took even more time to realize that we didn’t need to use generated functions to implement those iterators — inlining alone was a powerful enough tool. This story is long enough, though, the power of inlining “lispy” recursive functions in this example can be another story for another day.

This got long, but I hope it gives you a sense of how generated functions can be a form of very powerful “on-demand” code generation… but as Julia’s native codegen improves you don’t need to do this as often. I believe the most common use of generated functions now is just to do compile-time processing of type parameters to preserve type-stability.


Writing linear algebra routines for static arrays is something that I still feel I need to use generated functions for. The reason being that the resulting tuple has to be created “in one shot”. Something like https://github.com/JuliaLang/julia/pull/21912 would probably help with that.


@kristoffer.carlsson could you explain how you use @generated functions to create static arrays “in one shot”?

This is one example where I came across this recently: I have a function that maps an SVector of size nx to an SVector of the same size and the result may be computed from multiple (fully decoupled) equations that take a subset of the vector as an input.

I can do something like this:

# Note: Here, index_partitions is an Tuple of `SVectors` that I use to extract.
# Example: for nx = 5, and index_partitions = (SVector{2}(1:2),
# SVector{3}(3:5)) means that we have a fully decoupled equation to compute the
# first 2 elements of the result from the first 2 inputs and the last 3
# elements from the last 3 inputs.
function something{index_partitions}(x::SVector{nx}) where {index_partitions, nx}
    # create an MVector so that I can incrementally construct the result
    result = @MVector zeros(nx)
    for xids in index_partitions
        # result computed from decoupled equations
        result[xids] = something_with_subvector(x[xids])
    # convert back to SVector
    return SVector(result)

But I’d rather be able to create this in one shot. I feel like I should be able to do this with a generated function but I don’t really know how.


Could you construct a minimal working example that shows the thing you don’t want?

Why? If it’s because of performance, you should know that the compiler is usually capable of turning something like the example code above into very efficient machine code; the ‘copy’ from an MVector to an SVector is typically elided, and so is the allocation of result as long as something_with_subvector is inlined. I would generally prefer the code you showed over a generated function unless something_with_subvector is not inlined, for reasons of readability, compile time, and sometimes also performance (LLVM’s loop vectorizer is sometimes more effective than its SLP vectorizer, which you’re relying on with the fully-unrolled generated function approach).


You are right that in many cases the compiler is able to optimize away these allocations. However, I frequently have the problem that it does not happen (even if I encourage inlining with @inline) and in this case the temporary MMatrix inside that function allocates. I could always pre-allocate the memory but since these are only temporaries (the final result is an SMatrix), this seems a bit unnatural which is why I thought there might be a nicer way to do this.