Many people have wanted to know more about how to reduce latency and the internal details of precompilation. I’ve just added a variant of the discussion below to SnoopCompile’s docs (Snooping on inference: @snoopi · SnoopCompile), but since it seemed to address an unmet need I decided to cross-post it here and invite discussion. Note that this concerns precompile
directives in your package rather than usage of PackageCompiler
, but it may help explain why precompile
directives sometimes “work” and sometimes do not and provide strategies for more successful precompilation.
Suppose your package includes the following method:
"""
idx = index_midsum(a)
Return the index of the first item more than "halfway to the cumulative sum,"
meaning the smallest integer so that `sum(a[begin:idx]) >= sum(a)/2`.
"""
function index_midsum(a::AbstractVector)
ca = cumsum(vcat(0, a)) # cumulative sum of items in a, starting from 0
s = ca[end] # the sum of all elements
return findfirst(x->x >= s/2, ca) - 1 # compensate for inserting 0
end
Now, suppose that you’d like to reduce latency in using this method, and you know that an important use case is when a
is a Vector{Int}
. Therefore, you might precompile it:
julia> precompile(index_midsum, (Vector{Int},))
true
This will cause Julia to infer this method for the given argument types. If you add such statements to your package, it potentially saves your users from having to wait for it to be inferred each time they use your package.
But if you execute these lines in the REPL, and then check how well it worked, you might see something like the following:
julia> using SnoopCompile
julia> tinf = @snoopi index_midsum([1,2,3,4,100])
3-element Vector{Tuple{Float64, Core.MethodInstance}}:
(0.00048613548278808594, MethodInstance for cat_similar(::Int64, ::Type, ::Tuple{Int64}))
(0.010090827941894531, MethodInstance for (::Base.var"#cat_t##kw")(::NamedTuple{(:dims,), Tuple{Val{1}}}, ::typeof(Base.cat_t), ::Type{Int64}, ::Int64, ::Vararg{Any, N} where N))
(0.016659975051879883, MethodInstance for __cat(::Vector{Int64}, ::Tuple{Int64}, ::Tuple{Bool}, ::Int64, ::Vararg{Any, N} where N))
Even though we’d already said precompile(index_midsum, (Vector{Int},))
in this session, somehow we needed more inference of various concatenation methods. Why does this happen? A detailed investigation (e.g., using Cthulhu or @code_warntype
) would reveal that vcat(0, a)
is not inferrable “all the way down,” and hence the precompile
directive couldn’t predict everything that was going to be needed.
No problem, you say: let’s just precompile those methods too. The most expensive is the last one. You might not know where __cat
is defined, but you can find out with
julia> mi = tinf[end][2] # get the MethodInstance
MethodInstance for __cat(::Vector{Int64}, ::Tuple{Int64}, ::Tuple{Bool}, ::Int64, ::Vararg{Any, N} where N)
julia> mi.def # get the Method
__cat(A, shape::Tuple{Vararg{Int64, M}}, catdims, X...) where M in Base at abstractarray.jl:1599
julia> mi.def.module # which module was this method defined in?
Base
Armed with this knowledge, let’s start a fresh session (so that nothing is precompiled yet), and in addition to defining index_midsum
and precompiling it, we add
julia> precompile(Base.__cat, (Vector{Int64}, Tuple{Int64}, Tuple{Bool}, Int, Vararg{Any, N} where N))
true
Now if you try that tinf = @snoopi index_midsum([1,2,3,4,100])
line, you’ll see that the __cat
call is omitted, suggesting success.
However, if you put all this into your package with such precompile
in it and then check with @snoopi
again, you may be in for a rude surprise: the __cat
precompile directive doesn’t “work.” That turns out to be because your package doesn’t “own” that __cat
method—the module is Base
rather than YourPackage
—and therefore Julia doesn’t know where to store its precompiled form. (Successfully precompiled code is cached in the *.ji
files in your ~/.julia/compiled
directory.)
How to fix this? Fundamentally, the problem is that vcat
call: if we can write it in a way so that inference succeeds, then all these problems go away. It turns out that vcat
is fully inferrable if all the arguments have the same type, so just changing vcat(0, a)
to vcat([zero(eltype(a))], a)
fixes the problem. (Alternatively, you could make a copy and then use pushfirst!
.) In a fresh Julia session:
function index_midsum(a::AbstractVector)
ca = cumsum(vcat([zero(eltype(a))], a)) # cumulative sum of items in a, starting from 0
s = ca[end] # the sum of all elements
return findfirst(x->x >= s/2, ca) - 1 # compensate for inserting 0
end
julia> precompile(index_midsum, (Vector{Int},))
true
julia> using SnoopCompile
julia> tinf = @snoopi index_midsum([1,2,3,4,100])
Tuple{Float64, Core.MethodInstance}[]
Tada! No additional inference was needed, ensuring that your users will not suffer any latency due to type-inference of this particular method/argument combination.
In other cases, manual inspection of the results from @snoopi
may lead you in a different direction: you may discover that a huge number of specializations are being created for a method that doesn’t need them. Typical examples are methods that take types or functions as inputs: for example, there is no reason to recompile methods(f)
for each separate f
. In such cases, by far your best option is to add @nospecialize
annotations to one or more of the arguments of that method. Such changes can have dramatic impact on the latency of your package.
The ability to make interventions like these–which can both reduce latency and improve runtime speed–is a major reason to consider @snoopi
primarily as an analysis tool rather than just a utility to blindly generate lists of precompile directives.
EDIT: if you’re working to improve your package’s precompiles, I encourage you to do this work with Julia nightly (or recent master
source build). Precompilation should work better if you’re not invalidating methods you depend on.