Questions about Julia's compilation model

I have a few questions about how Julia’s compiler works.

First, I sometimes see invalidated methodinstances with abstract arguments, e.g. nextind(s::AbstractString, i::Int64) from SnoopCompile output like this:

backedges: 1: superseding nextind(s::AbstractString, i::Int64) @ Base strings/basic.jl:556 with MethodInstance for nextind(::AbstractString, ::Int64) (8 children)

How is it possible to have methodinstances with abstract arguments? I was under the impression that Julia, when calling a function, first found the matching method, then monomorphized the function to a fully concrete methodinstance using the concrete values at runtime before compiling. Since abstract types do not exist at runtime, how is it possible to have abstract methodinstances?

Second, what is the distinction between

  • Methodinstances
  • Codeinstances
  • Specializations

?

2 Likes

This is a little off topic, but would you try Add native UTF-8 Validation using fast shift based DFA by ndinsmore · Pull Request #47880 · JuliaLang/julia · GitHub and see if you still get that error? That PR speeds up nextind by 3x and I think there was a code issue that I fixed while doing it.

This is possible. Although you can’t construct abstract types directly, some builtin functions can return abstract type during compilation time. For example, getindex(arr::Vector{AbstractString}, i::Int) returns AbstractString. Julia may choose to compile a function using this abstract type in the simple sceniao (Julia’s method matching algorithm can work on abstract type signature). Another possibility is @nonspecialize. Or you can call precompile manually on abstract types.

I believe that method instances with abstract argument types only express incomplete specializations which hold limited assumptions on the type of the corresponding arguments. It doesn’t mean that such arguments will be abstract at runtime (as you point out, there is no such thing), only that a given method was compiled with partial knowledge of the type of those arguments.

1 Like

Yes, if there’s a type instability and a function with a limited number of methods (four or less, I think) is called with that unstable argument, Julia will effectively do a union-split based on the existence of the methods at compile time. If the method table changes, then the union of possibilities the unstable arg could be changes and you get an invalidation. This ends up appearing as a backedge in the method table of the called function itself because if the method table changes you need to invalidate the function(s) that call into it with unknown/unstable args. I believe you can also get backedges with abstract arguments from invoke calls.

Here’s my understanding of the lay of the land:

  • Functions have a MethodTable
  • The MethodTable sorts out all the Methods that you’ve defined
  • A given Method has specializations — MethodInstances — describe the args that you’ve called it with
  • The MethodInstance has a bunch of metadata about that particular specialization and one or more versions of its compiled code — the CodeInstance.
  • Each CodeInstance is valid for a particular range of world ages. An invalidation leads to the compilation of a new CodeInstance for the new world.
5 Likes

Thanks for the explanation.

You mention that the compiler can generate code in the face of type instability using union splitting and world splitting - however, if I understand correctly, neither of these mechanisms actually compile abstract type signatures.

Consider:

f(x::Int) = 1
g(x) = f(first(x))

g(Any[1])

Here, g will use world splitting to compile to the equivalent of:

y = f(x)
y isa Int ? 1 : f(y)

In this code, the type of y is not known. However, this methodinstance does not need to compile any methods for abstract type signatures, right? f(y::Any) does appear, but this is statically known to throw and thus there is little benefit in not having a dynamic dispatch there.

1 Like

In your example, Julia is not compiling f(::Any), but the compiler is making assumptions about it for the sake of re-establishing type stability:

julia> @code_typed g(Any[1])
CodeInfo(
1 ─ %1 = Base.arrayref(true, x, 1)::Any
│   %2 = (isa)(%1, Int64)::Bool
└──      goto #3 if not %2
2 ─      goto #4
3 ─ %5 = Main.f(%1)::Int64
└──      goto #4
4 ┄ %7 = φ (#2 => 1, #3 => %5)::Int64
└──      return %7
) => Int64

It’s highly advantageous to be able to say that g always returns an Int. Adding more methods here (or changing the output type) will appear in SnoopCompile’s output as an invalidation of g(::Any), I think.

1 Like

But then I don’t understand how that relates to my question in the OP - in which situation are methodinstances with abstract arguments (as observed to be invalidated) compiled?

That situation appears in the SnoopCompile tutorial here: Snooping on and fixing invalidations: @snoopr · SnoopCompile

If you just execute the first part (before you cause the invalidations to occur), you can see that there exists a specialization on the abstract f(::AbstractFloat):

julia> f(::Real) = 1;

julia> callf(container) = f(container[1]);

julia> call2f(container) = callf(container);

julia> c64  = [1.0]; c32 = [1.0f0]; cabs = AbstractFloat[1.0];  # Vector{Float64}, Vector{Float32}, and Vector{AbstractFloat}, respectively

julia> call2f(c64);  call2f(c32); call2f(cabs);

julia> methods(f)
# 1 method for generic function "f" from Main:
 [1] f(::Real)
     @ REPL[1]:1

julia> methods(f)[1].specializations
svec(MethodInstance for f(::Float64), MethodInstance for f(::Float32), MethodInstance for f(::AbstractFloat), nothing, nothing, nothing, nothing, nothing)

julia> mi = methods(f)[1].specializations[3]
MethodInstance for f(::AbstractFloat)

julia> mi.cache
Core.CodeInstance(MethodInstance for f(::AbstractFloat), #undef, 0x0000000000000001, 0xffffffffffffffff, Int64, 1, nothing, 0x000004e0, 0x000004e0, nothing, false, false, 0x01, Ptr{Nothing} @0x0000000100dff510, Ptr{Nothing} @0x0000000000000000)

Now… did f(::AbstractFloat) itself really get compiled? I think it’s just there a shim for holding onto those backedges. In this case it just got inlined into the callf function directly as the constant 1, but even if it didn’t inline I think its function pointer would still be exactly the same as the other specializations.

2 Likes

Thanks for the example! I’ll investigate from here :slight_smile: