Compilation Over Branches

bjack205 · January 17, 2022, 11:04pm

I have a question on how the compiler internals work. Does the compiler compile all functions that could called in any branch of the code? For example, if I have a function like the following:

using StaticArrays

function fast(A)::Nothing
    for i in eachindex(A)
        A[i] += 1
    end
    return nothing
end

function slow(A)::Nothing
    A .+= 1
    return nothing
end

function outer(A; flag=false)
    if flag
        slow(A)::Nothing
    else
        fast(A)::Nothing
    end
end

A = SizedMatrix{50,50}(zeros(50,50))
@time fast(A);
@time outer(A);

which returns

0.007224 seconds (19.49 k allocations: 1.133 MiB, 99.70% compilation time)
2.679748 seconds (19.08 M allocations: 711.613 MiB, 12.69% gc time, 100.00% compilation time)

So it’s pretty obvious the slow method is getting compiled, even though it’s never called. Does this continue until all branches have been called? For example, if slow had more branches, would all of those get compiled as well?

Is the only way to avoid compiling the slow method here to use a compile-time action using e.g. multiple dispatch on the type of A or @generated?

My use case is pretty similar to this, where I want to use StaticArrays for small sizes, but fall back to methods that use normal arrays for larger sizes because the compilation time blows up for large StaticArrays. For some critical computation kernels I have 2 version like the above, and want to switch easily between them. I just want to verify that this HAS to be done as a compile-time decision. For example, ideally I’d let the user decide if it’s worth it to pay the compile-time cost for runtime performance (which actually isn’t much in this case, but in general, you could see this being a logical tradeoff you’d want to expose to the end user). My previous understanding was that the Julia compiler only compiled functions as it encountered them. I was a little surprised that slow had to be compiled, even when outer already knows the output type.

Keno · January 17, 2022, 11:08pm

The compiler tries to compile the maximal amount of code that is statically derivable from the entrypoint to the compiler. There currently aren’t really any barriers to control this behavior, but if you do something like Base.inferencebarrier(slow)(A), you’ll prevent inference from knowing what slow is, so it won’t get compiled.

Oscar_Smith · January 17, 2022, 11:11pm

Note that this will result in losing type stability.

lmiq · January 17, 2022, 11:11pm

This seems a good place for multiple dispatch with a function barrier.

Something like this?

julia> g(x::SVector) = 1
g (generic function with 3 methods)

julia> g(x::AbstractVector) = 2
g (generic function with 3 methods)

julia> function f(x)
           if length(x) < 10
               y = SVector(ntuple(i -> x[i], length(x))...)
           else
               y = copy(x)
           end
           return g(y)
       end
f (generic function with 1 method)

julia> @time f(rand(3))
  0.012883 seconds (25.37 k allocations: 1.655 MiB, 99.67% compilation time)
1

julia> @time f(rand(3))
  0.000014 seconds (7 allocations: 224 bytes)
1

julia> @time f(rand(20))
  0.000125 seconds (397 allocations: 28.062 KiB, 79.30% compilation time)
2

julia> @time f(rand(20))
  0.000006 seconds (2 allocations: 448 bytes)
2

f is type-unstable, though. That would only make sense if g is where the expensive computations occur.

bjack205 · January 17, 2022, 11:15pm

Okay, this is a useful insight into the compiler behavior, thank you!

bjack205 · January 17, 2022, 11:16pm

You could fix the type instability with a type annotation like above, though, right?

JeffreySarnoff · January 18, 2022, 12:04am

fixing type instability

Stripping down the example (possibly too much)…

g(x::AbstractVector) = 2
g(x::SVector) = 1

function f(x)
    y = copy(x)
    return g(y)
end

a = SVector{4}(collect(1:4));
b = collect(1:12);

julia> @code_warntype(f(a))
MethodInstance for f(::SVector{4, Int64})
  from f(x) in Main at REPL[4]:1
Arguments
  #self#::Core.Const(f)
  x::SVector{4, Int64}
Locals
  y::SVector{4, Int64}
Body::Int64
1 ─      (y = Main.copy(x))
│   %2 = Main.g(y)::Core.Const(1)
└──      return %2

julia> @code_warntype(f(b))
MethodInstance for f(::Vector{Int64})
  from f(x) in Main at REPL[4]:1
Arguments
  #self#::Core.Const(f)
  x::Vector{Int64}
Locals
  y::Vector{Int64}
Body::Int64
1 ─      (y = Main.copy(x))
│   %2 = Main.g(y)::Core.Const(2)
└──      return %2

lmiq · January 18, 2022, 12:16am

Yeah, the hard part for type stability is to make a decision based on the vector length, when the length is not part of the type (as for StaticArrays).

anon56330260 · January 18, 2022, 5:00am

Some compilers do support such behavior. For example, to accelerate application warm up time, Android (Java) and many JS engine distinguish between “cold” code and “hot” code and may delay the compilation of “cold” code or only use a less optimized interpreter/bytecode compiler. Once the “cold” code has executed enough times, the compiler will promote the “cold” code to a “hot” code and recompile the function.

Unfortunately, some dedicated mechanisms are needed here to avoid performance loss. After compiling of the cold code, the previously generated machine code needs to be “patched” to switch from the old less optimized code to this newly compiled code. Otherwise a runtime check is needed to check whether the code has already got compiled.

I am not sure whether this is a good idea for numerical code because it might hurt performance if the code is invoked in a loop. Another problem is that this also has something to do with type inference. Even we can defer compilation to runtime, type inference must function statically as a whole unless we add necessary annotations (that is, there’s no partial type inference). Anyway, this could be an interesting demo of dynamic compilation.

jling · January 18, 2022, 5:27am

indeed, Julia does not use: Tracing just-in-time compilation - Wikipedia (notable examples are HotSpot JVM and V8 I guess)

however, there’s https://github.com/tisztamo/Catwalk.jl that pushes optimization even further “at runtime”.

Topic		Replies	Views
Compile-time for large SizedArray (StaticArrays.jl) General Usage question	3	901	September 20, 2019
Compilation times for long static vectors are increasing Internals compilation	2	770	May 19, 2019
Better compile-time computation facility Internals & Design	3	263	July 22, 2023
Questions about Compiler and Compiling Modules New to Julia	21	2100	January 17, 2019
Compilation time not vanishing on subsequent runs New to Julia question	4	471	June 9, 2022

Compilation Over Branches

Related topics