I have a question on how the compiler internals work. Does the compiler compile all functions that could called in any branch of the code? For example, if I have a function like the following:
using StaticArrays
function fast(A)::Nothing
for i in eachindex(A)
A[i] += 1
end
return nothing
end
function slow(A)::Nothing
A .+= 1
return nothing
end
function outer(A; flag=false)
if flag
slow(A)::Nothing
else
fast(A)::Nothing
end
end
A = SizedMatrix{50,50}(zeros(50,50))
@time fast(A);
@time outer(A);
which returns
0.007224 seconds (19.49 k allocations: 1.133 MiB, 99.70% compilation time)
2.679748 seconds (19.08 M allocations: 711.613 MiB, 12.69% gc time, 100.00% compilation time)
So it’s pretty obvious the slow method is getting compiled, even though it’s never called. Does this continue until all branches have been called? For example, if slow had more branches, would all of those get compiled as well?
Is the only way to avoid compiling the slow method here to use a compile-time action using e.g. multiple dispatch on the type of A or @generated?
My use case is pretty similar to this, where I want to use StaticArrays for small sizes, but fall back to methods that use normal arrays for larger sizes because the compilation time blows up for large StaticArrays. For some critical computation kernels I have 2 version like the above, and want to switch easily between them. I just want to verify that this HAS to be done as a compile-time decision. For example, ideally I’d let the user decide if it’s worth it to pay the compile-time cost for runtime performance (which actually isn’t much in this case, but in general, you could see this being a logical tradeoff you’d want to expose to the end user). My previous understanding was that the Julia compiler only compiled functions as it encountered them. I was a little surprised that slow had to be compiled, even when outer already knows the output type.
The compiler tries to compile the maximal amount of code that is statically derivable from the entrypoint to the compiler. There currently aren’t really any barriers to control this behavior, but if you do something like Base.inferencebarrier(slow)(A), you’ll prevent inference from knowing what slow is, so it won’t get compiled.
Yeah, the hard part for type stability is to make a decision based on the vector length, when the length is not part of the type (as for StaticArrays).
Some compilers do support such behavior. For example, to accelerate application warm up time, Android (Java) and many JS engine distinguish between “cold” code and “hot” code and may delay the compilation of “cold” code or only use a less optimized interpreter/bytecode compiler. Once the “cold” code has executed enough times, the compiler will promote the “cold” code to a “hot” code and recompile the function.
Unfortunately, some dedicated mechanisms are needed here to avoid performance loss. After compiling of the cold code, the previously generated machine code needs to be “patched” to switch from the old less optimized code to this newly compiled code. Otherwise a runtime check is needed to check whether the code has already got compiled.
I am not sure whether this is a good idea for numerical code because it might hurt performance if the code is invoked in a loop. Another problem is that this also has something to do with type inference. Even we can defer compilation to runtime, type inference must function statically as a whole unless we add necessary annotations (that is, there’s no partial type inference). Anyway, this could be an interesting demo of dynamic compilation.