Compilation Over Branches

I have a question on how the compiler internals work. Does the compiler compile all functions that could called in any branch of the code? For example, if I have a function like the following:

using StaticArrays

function fast(A)::Nothing
    for i in eachindex(A)
        A[i] += 1
    end
    return nothing
end

function slow(A)::Nothing
    A .+= 1
    return nothing
end

function outer(A; flag=false)
    if flag
        slow(A)::Nothing
    else
        fast(A)::Nothing
    end
end

A = SizedMatrix{50,50}(zeros(50,50))
@time fast(A);
@time outer(A);

which returns

0.007224 seconds (19.49 k allocations: 1.133 MiB, 99.70% compilation time)
2.679748 seconds (19.08 M allocations: 711.613 MiB, 12.69% gc time, 100.00% compilation time)

So it’s pretty obvious the slow method is getting compiled, even though it’s never called. Does this continue until all branches have been called? For example, if slow had more branches, would all of those get compiled as well?

Is the only way to avoid compiling the slow method here to use a compile-time action using e.g. multiple dispatch on the type of A or @generated?

My use case is pretty similar to this, where I want to use StaticArrays for small sizes, but fall back to methods that use normal arrays for larger sizes because the compilation time blows up for large StaticArrays. For some critical computation kernels I have 2 version like the above, and want to switch easily between them. I just want to verify that this HAS to be done as a compile-time decision. For example, ideally I’d let the user decide if it’s worth it to pay the compile-time cost for runtime performance (which actually isn’t much in this case, but in general, you could see this being a logical tradeoff you’d want to expose to the end user). My previous understanding was that the Julia compiler only compiled functions as it encountered them. I was a little surprised that slow had to be compiled, even when outer already knows the output type.

1 Like

The compiler tries to compile the maximal amount of code that is statically derivable from the entrypoint to the compiler. There currently aren’t really any barriers to control this behavior, but if you do something like Base.inferencebarrier(slow)(A), you’ll prevent inference from knowing what slow is, so it won’t get compiled.

5 Likes

Note that this will result in losing type stability.

1 Like

This seems a good place for multiple dispatch with a function barrier.

Something like this?

julia> g(x::SVector) = 1
g (generic function with 3 methods)

julia> g(x::AbstractVector) = 2
g (generic function with 3 methods)

julia> function f(x)
           if length(x) < 10
               y = SVector(ntuple(i -> x[i], length(x))...)
           else
               y = copy(x)
           end
           return g(y)
       end
f (generic function with 1 method)

julia> @time f(rand(3))
  0.012883 seconds (25.37 k allocations: 1.655 MiB, 99.67% compilation time)
1

julia> @time f(rand(3))
  0.000014 seconds (7 allocations: 224 bytes)
1

julia> @time f(rand(20))
  0.000125 seconds (397 allocations: 28.062 KiB, 79.30% compilation time)
2

julia> @time f(rand(20))
  0.000006 seconds (2 allocations: 448 bytes)
2

f is type-unstable, though. That would only make sense if g is where the expensive computations occur.

1 Like

Okay, this is a useful insight into the compiler behavior, thank you!

You could fix the type instability with a type annotation like above, though, right?

1 Like

fixing type instability

Stripping down the example (possibly too much)…

g(x::AbstractVector) = 2
g(x::SVector) = 1

function f(x)
    y = copy(x)
    return g(y)
end

a = SVector{4}(collect(1:4));
b = collect(1:12);
julia> @code_warntype(f(a))
MethodInstance for f(::SVector{4, Int64})
  from f(x) in Main at REPL[4]:1
Arguments
  #self#::Core.Const(f)
  x::SVector{4, Int64}
Locals
  y::SVector{4, Int64}
Body::Int64
1 ─      (y = Main.copy(x))
│   %2 = Main.g(y)::Core.Const(1)
└──      return %2

julia> @code_warntype(f(b))
MethodInstance for f(::Vector{Int64})
  from f(x) in Main at REPL[4]:1
Arguments
  #self#::Core.Const(f)
  x::Vector{Int64}
Locals
  y::Vector{Int64}
Body::Int64
1 ─      (y = Main.copy(x))
│   %2 = Main.g(y)::Core.Const(2)
└──      return %2

Yeah, the hard part for type stability is to make a decision based on the vector length, when the length is not part of the type (as for StaticArrays).

Some compilers do support such behavior. For example, to accelerate application warm up time, Android (Java) and many JS engine distinguish between “cold” code and “hot” code and may delay the compilation of “cold” code or only use a less optimized interpreter/bytecode compiler. Once the “cold” code has executed enough times, the compiler will promote the “cold” code to a “hot” code and recompile the function.

Unfortunately, some dedicated mechanisms are needed here to avoid performance loss. After compiling of the cold code, the previously generated machine code needs to be “patched” to switch from the old less optimized code to this newly compiled code. Otherwise a runtime check is needed to check whether the code has already got compiled.

I am not sure whether this is a good idea for numerical code because it might hurt performance if the code is invoked in a loop. Another problem is that this also has something to do with type inference. Even we can defer compilation to runtime, type inference must function statically as a whole unless we add necessary annotations (that is, there’s no partial type inference). Anyway, this could be an interesting demo of dynamic compilation.

2 Likes

indeed, Julia does not use: Tracing just-in-time compilation - Wikipedia (notable examples are HotSpot JVM and V8 I guess)

however, there’s https://github.com/tisztamo/Catwalk.jl that pushes optimization even further “at runtime”.