Many people have wanted to know more about how to reduce latency and the internal details of precompilation. I’ve just added a variant of the discussion below to SnoopCompile’s docs (https://timholy.github.io/SnoopCompile.jl/dev/snoopi/#Understanding-precompilation-and-its-limitations-1), but since it seemed to address an unmet need I decided to cross-post it here and invite discussion. Note that this concerns `precompile`

directives in your package rather than usage of `PackageCompiler`

, but it may help explain why `precompile`

directives sometimes “work” and sometimes do not and provide strategies for more successful precompilation.

Suppose your package includes the following method:

```
"""
idx = index_midsum(a)
Return the index of the first item more than "halfway to the cumulative sum,"
meaning the smallest integer so that `sum(a[begin:idx]) >= sum(a)/2`.
"""
function index_midsum(a::AbstractVector)
ca = cumsum(vcat(0, a)) # cumulative sum of items in a, starting from 0
s = ca[end] # the sum of all elements
return findfirst(x->x >= s/2, ca) - 1 # compensate for inserting 0
end
```

Now, suppose that you’d like to reduce latency in using this method, and you know that an important use case is when `a`

is a `Vector{Int}`

. Therefore, you might precompile it:

```
julia> precompile(index_midsum, (Vector{Int},))
true
```

This will cause Julia to infer this method for the given argument types. If you add such statements to your package, it potentially saves your users from having to wait for it to be inferred each time they use your package.

But if you execute these lines in the REPL, and then check how well it worked, you might see something like the following:

```
julia> using SnoopCompile
julia> tinf = @snoopi index_midsum([1,2,3,4,100])
3-element Vector{Tuple{Float64, Core.MethodInstance}}:
(0.00048613548278808594, MethodInstance for cat_similar(::Int64, ::Type, ::Tuple{Int64}))
(0.010090827941894531, MethodInstance for (::Base.var"#cat_t##kw")(::NamedTuple{(:dims,), Tuple{Val{1}}}, ::typeof(Base.cat_t), ::Type{Int64}, ::Int64, ::Vararg{Any, N} where N))
(0.016659975051879883, MethodInstance for __cat(::Vector{Int64}, ::Tuple{Int64}, ::Tuple{Bool}, ::Int64, ::Vararg{Any, N} where N))
```

Even though we’d already said `precompile(index_midsum, (Vector{Int},))`

in this session, somehow we needed *more* inference of various concatenation methods. Why does this happen? A detailed investigation (e.g., using Cthulhu or `@code_warntype`

) would reveal that `vcat(0, a)`

is not inferrable “all the way down,” and hence the `precompile`

directive couldn’t predict everything that was going to be needed.

No problem, you say: let’s just precompile those methods too. The most expensive is the last one. You might not know where `__cat`

is defined, but you can find out with

```
julia> mi = tinf[end][2] # get the MethodInstance
MethodInstance for __cat(::Vector{Int64}, ::Tuple{Int64}, ::Tuple{Bool}, ::Int64, ::Vararg{Any, N} where N)
julia> mi.def # get the Method
__cat(A, shape::Tuple{Vararg{Int64, M}}, catdims, X...) where M in Base at abstractarray.jl:1599
julia> mi.def.module # which module was this method defined in?
Base
```

Armed with this knowledge, let’s start a fresh session (so that nothing is precompiled yet), and in addition to defining `index_midsum`

and precompiling it, we add

```
julia> precompile(Base.__cat, (Vector{Int64}, Tuple{Int64}, Tuple{Bool}, Int, Vararg{Any, N} where N))
true
```

Now if you try that `tinf = @snoopi index_midsum([1,2,3,4,100])`

line, you’ll see that the `__cat`

call is omitted, suggesting success.

However, if you put all this into your package with such `precompile`

in it and then check with `@snoopi`

again, you may be in for a rude surprise: the `__cat`

precompile directive doesn’t “work.” That turns out to be because your package doesn’t “own” that `__cat`

method—the module is `Base`

rather than `YourPackage`

—and therefore Julia doesn’t know where to store its precompiled form. (Successfully precompiled code is cached in the `*.ji`

files in your `~/.julia/compiled`

directory.)

How to fix this? Fundamentally, the problem is that `vcat`

call: if we can write it in a way so that inference succeeds, then all these problems go away. It turns out that `vcat`

is fully inferrable if all the arguments have the same type, so just changing `vcat(0, a)`

to `vcat([zero(eltype(a))], a)`

fixes the problem. (Alternatively, you could make a copy and then use `pushfirst!`

.) In a fresh Julia session:

```
function index_midsum(a::AbstractVector)
ca = cumsum(vcat([zero(eltype(a))], a)) # cumulative sum of items in a, starting from 0
s = ca[end] # the sum of all elements
return findfirst(x->x >= s/2, ca) - 1 # compensate for inserting 0
end
julia> precompile(index_midsum, (Vector{Int},))
true
julia> using SnoopCompile
julia> tinf = @snoopi index_midsum([1,2,3,4,100])
Tuple{Float64, Core.MethodInstance}[]
```

Tada! No additional inference was needed, ensuring that your users will not suffer any latency due to type-inference of this particular method/argument combination.

In other cases, manual inspection of the results from `@snoopi`

may lead you in a different direction: you may discover that a huge number of specializations are being created for a method that doesn’t need them. Typical examples are methods that take types or functions as inputs: for example, there is no reason to recompile `methods(f)`

for each separate `f`

. In such cases, by far your best option is to add `@nospecialize`

annotations to one or more of the arguments of that method. Such changes can have dramatic impact on the latency of your package.

The ability to make interventions like these–which can both reduce latency and improve runtime speed–is a major reason to consider `@snoopi`

primarily as an analysis tool rather than just a utility to blindly generate lists of precompile directives.

EDIT: if you’re working to improve your package’s precompiles, I encourage you to do this work with Julia nightly (or recent `master`

source build). Precompilation should work better if you’re not invalidating methods you depend on.