Precompiling non-inferrable calls

Let me start of by saying that precompile is great. It can move a lot of compilation time from the package loading to the precompilation stage.

For people unfamiliar with precompile, when calling it on a function, it precompiles the function for given argument types without actually running the code. For example, compare

julia> a(x) = x;

julia> warmup(x) = x; @time @eval warmup(1);
  0.004778 seconds (1.42 k allocations: 93.156 KiB, 86.03% compilation time)

julia> @time @eval a(1);
  0.002554 seconds (465 allocations: 32.016 KiB, 96.76% compilation time)

julia> @time @eval a(1);
  0.000072 seconds (41 allocations: 1.906 KiB)

to

julia> b(x) = x;

julia> warmup(x) = x; @time @eval warmup(1);
  0.005445 seconds (1.42 k allocations: 93.156 KiB, 85.68% compilation time)

julia> precompile(b, (Int,));

julia> @time @eval b(1);
  0.000072 seconds (41 allocations: 1.906 KiB)

julia> @time @eval b(1);
  0.000073 seconds (41 allocations: 1.906 KiB)

When method calls can be inferred, precompile will even recursively compile methods.

However, when method calls cannot be inferred, precompile will just give up. For example, running

ex = :(1 + 1)

warmup(x) = x
println("Warmup:")
@time @eval warmup(ex)
@time @eval warmup(ex)
println()

# When f would be inlined, it doesn't have to be compiled anymore.
@noinline f(x) = x
g(ex::Expr) = f(ex.args[1])
precompile(g, (Expr,))
println("Time with 1 precompile:")
@time @eval g(ex)
@time @eval g(ex)
println()

@noinline u(x) = x
v(ex::Expr) = u(ex.args[1])
precompile(u, (Symbol,))
precompile(v, (Expr,))
println("Time with 2 precompiles:")
@time @eval v(ex)
@time @eval v(ex)
println()

gives as output

Warmup:
  0.003992 seconds (1.42 k allocations: 93.156 KiB, 119.82% compilation time)
  0.000103 seconds (41 allocations: 1.906 KiB)

Time with 1 precompile:
  0.001883 seconds (476 allocations: 32.344 KiB, 94.67% compilation time)
  0.000078 seconds (41 allocations: 1.906 KiB)

Time with 2 precompiles:
  0.000079 seconds (41 allocations: 1.906 KiB)
  0.000073 seconds (41 allocations: 1.906 KiB)

because the inferred type of ex.args[1] is not concrete, that is, isconcretetype is false:

julia> @code_warntype g(ex)
MethodInstance for g(::Expr)
  from g(ex::Expr) in Main at /home/rik/Downloads/tmp/typeinfer.jl:4
Arguments
  #self#::Core.Const(g)
  ex::Expr
Body::Any
1 ─ %1 = Base.getproperty(ex, :args)::Vector{Any}
│   %2 = Base.getindex(%1, 1)::Any
│   %3 = Main.f(%2)::Any
└──      return %3

If I understand correctly, finding those precompile statements is the problem that SnoopCompile.jl solves by recognizing runtime dispatches when running some workload and generating precompile statements for the called methods.

To my surprise, SnoopCompile does not detect the any precompile statements after having removed the precompile directives and running the following in a fresh session:

using Pkg
Pkg.activate(; temp=true)
Pkg.add(["SnoopCompile", "ProfileSVG"])
using SnoopCompile, ProfileSVG

ex = :(1 + 1)

f(x) = x
g(ex::Expr) = f(ex.args[1])

tinf = @snoopi_deep g(ex)
fg = flamegraph(tinf)
ProfileSVG.save("profile.svg", fg)

shows
image

where only the red block is the compilation of f and the tiny red line is the compilation of g. So, most time in spent somewhere outside the compilation of f and g. Even more surprisingly, SnoopCompile.parcel doesn’t generate any statements:

julia> ttot, pcs = SnoopCompile.parcel(tinf);

julia> SnoopCompile.write("precompile.jl", pcs)
Base: no precompile statements out of 7.3838e-5

Based on this, I have 2 questions:

  1. Is there any technique that I’m overlooking for finding a precompile directive for the inner method, that is, precompile(f, (Symbol,))?
  2. Why doesn’t SnoopCompile determine directives? I’m assuming that I’m doing something wrong here.

cc @tim.holy

EDIT: I’ve updated the some examples because I wasn’t taking the warmup time for the first call to some function call via @time @eval f(x) into account. Also, I’ve had to add @noinline.

3 Likes

@snoopi_deep records time for type inference but not the rest of compilation (generation & compilation of LLVM IR). There’s @snoopl for that.

Is there any technique that I’m overlooking for finding a precompile directive for the inner method, that is, precompile(f, (Symbol,))

This is the advantage of using a workload rather than explicit precompile directives: runtime dispatch results into a fresh entrance into inference, which gets logged by @snoopi_deep. In contrast, explicit precompile directives stop at runtime dispatch. Of course, SnoopCompile.parcel can generate them for you as long as you’ve monitored them under @snoopi_deep, but as long as you can afford to actually run the workload during package precompilation (e.g., it doesn’t do anything you’d rather not do, like plot something on the display, delete files on your harddrive, or hit the network), it may be easier. The one exception is when the runtime-dispatched method belongs to a different package (or Base), in which case you’ll need the explicit precompile to connect it to your package.

But in this case Julia will call the generic f(::Any) because it can be implemented without specialization and does not require a runtime dispatch. If you need to teach g that ex.args[1] is always a Symbol, write its implementation as

g(ex::Expr) = f(ex.args[1]::Symbol)

(But be very sure it is always a Symbol.)

1 Like

Thanks Tim for your response. That clarifies things.

I just discovered that the problem of precompile not hitting the inner function is resolved in Julia 1.8 because when I run the following:

macro time_twice(ex::Expr)
    println("$ex:")
    @time eval(ex)
    @time eval(ex)
    println()
end

warmup(x) = x
@time_twice warmup(1)

@noinline _inferencebarrier(@nospecialize(x)) = Ref{Any}(x)[]
precompile(_inferencebarrier, (Int,))
@time_twice _inferencebarrier(1)

@noinline _another_inferencebarrier(@nospecialize(x)) = Ref{Any}(x)[]
function f()
    x = _another_inferencebarrier(1)
    return x + 1
end
precompile(f, ())
@time_twice f()

inferable(x) = x
g() = inferable(1)
precompile(g, ())
@time_twice g()

This is what I get in Julia 1.7.2

warmup(1):
  0.006106 seconds (1.42 k allocations: 93.062 KiB, 89.73% compilation time)
  0.000205 seconds (39 allocations: 1.812 KiB)

_inferencebarrier(1):
  0.000151 seconds (39 allocations: 1.812 KiB)
  0.000145 seconds (39 allocations: 1.812 KiB)

f():
  0.002549 seconds (1.06 k allocations: 62.466 KiB, 93.12% compilation time)
  0.000283 seconds (39 allocations: 1.812 KiB)

g():
  0.000125 seconds (39 allocations: 1.812 KiB)
  0.000098 seconds (39 allocations: 1.812 KiB)

and in Julia 1.8-beta3:

warmup(1):
  0.009057 seconds (2.52 k allocations: 173.615 KiB, 89.93% compilation time)
  0.000220 seconds (39 allocations: 1.812 KiB)

_inferencebarrier(1):
  0.000151 seconds (39 allocations: 1.812 KiB)
  0.000115 seconds (39 allocations: 1.812 KiB)

f():
  0.000251 seconds (39 allocations: 1.812 KiB)
  0.000112 seconds (39 allocations: 1.812 KiB)

g():
  0.000145 seconds (39 allocations: 1.812 KiB)
  0.000088 seconds (39 allocations: 1.812 KiB)

Big applause for the compiler engineers.

1 Like