In one of my packages I tried to lower import
times, so I added a couple of typical usage examples in a “precompile” function. Something like
function _my_precompile()
# my "typical use case" code
...
end
function _precompile_()
ccall(:jl_generating_output, Cint, ()) == 1 || return nothing
_my_precompile()
# precompile directives from SnoopCompile
...
end
_precompile_()
However, in the presence of the _my_precompile
, benchmarks of my code show thousands of allocations in code that should not be allocating. On the other hand, after I remove _my_precompile
, all these spurious allocations disappear, as they should.
I am deeply confused. How can precompilation functions that do not change the code lead to different allocation behavior during use of the library?
I do not have a MWE yet, but if you want to verify this behavior for yourself, here is the commit which removes the precompilation. Before it the benchmarks show allocations, after it most allocations disappear. The benchmark itself is part of the commit message Custom precompilation causes allocations!? · Krastanov/QuantumClifford.jl@27a5dad · GitHub
julia> versioninfo()
Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: AMD Ryzen 7 1700 Eight-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, znver1)
1 Like
Salmon
December 28, 2021, 2:20pm
#2
I’m bumping this since I am also interested in the answer.
A while ago I have posted a similar question to this:
Hi everyone,
I have recently tried with little success to add a precompilation step to my own package project and hope someone can help me clarify a few problems.
In this post
it is explained that one can precompile methods by calling precompile(method, (types...), however making it work in practice seems quite cumbersome. In particular, it is pointed out that the compiler cannot really precompile functions called within method without further steps.
If I have a lot of functions is it rea…
I think it ultimately has to do with the fact that precompilation needs more than to just call a typical use case function, apparently you also have to explicitly specify input types (?).
At least that was my takeaway so far, hopefully someone who knows more on this can enlighten us here
1 Like
Discussion in Taking TTFX seriously: Can we make common packages faster to load and use - #9 by jlapeyre seems to answer some of my questions.
Julia bugs to track if you have such problems:
opened 08:44AM - 08 May 20 UTC
bug
inference
Using Julia master (newer than 301db971daaeeb627ba768375538e6e7ff36d215), the fo… llowing does not infer the correct type (see also the discussion in #34048):
```julia
using Test, LinearAlgebra
# @inferred mapreduce(norm, +, [rand(1)]);
@inferred mapreduce(norm, +, [rand(1)]; init = 0.);
```
Note that the third line works if you comment out the second line (start from a fresh Julia session), which makes testing this issue difficult.
opened 11:29AM - 21 May 20 UTC
performance
precompile
If I make a package `M` with this content of `src/M.jl`:
```
module M
f(x) = … x > 0 ? cbrt(x) : 0.0
precompile(f, (Float64,))
end
```
and run
```
using M, BenchmarkTools
x1 = fill(0.0, 1000000);
x2 = fill(1.0, 1000000);
@btime M.f.($x1);
@btime M.f.($x2);
```
the result is
```
842.301 μs (2 allocations: 7.63 MiB)
20.586 ms (2000002 allocations: 38.15 MiB)
```
Without `precompile(f, (Float64,))` the excessive allocations go away.
Also check out the bugs in other packages that link to the ones above.
A few more issues with useful comments on the topic
opened 08:59AM - 09 Dec 19 UTC
precompile
In https://github.com/JuliaGraphics/Colors.jl/pull/370 it was noticed that addin… g `precompile` statements seems to have performance implications, and not always in a good way.
Running `--track-allocation=all` with the statements
```julia
julia> using Colors, FixedPointNumbers
julia> cols = rand(RGB{N0f8}, 10^5);
julia> cstrs = ['#'*hex(c) for c in cols];
julia> parsec(cstr) = parse(Colorant, cstr)
parsec (generic function with 1 method)
julia> map(parsec, cstrs);
julia> using Profile
julia> Profile.clear_malloc_data()
julia> map(parsec, cstrs);
```
and then moving the `*.mem` files (in `base/` and all relevant packages) to `/tmp/withpc` (with precompilation) and `/tmp/nopc` (without precompilation) and running the following analysis script:
```julia
using Glob
jlfiles = Set{String}()
for dir in ["withpc", "nopc"]
cd(dir) do
for (root, dirs, files) in walkdir(".")
for file in files
push!(jlfiles, joinpath(root, splitext(splitext(file)[1])[1]))
end
end
end
end
for file in jlfiles
flw = glob(file*"*", "withpc")
fln = glob(file*"*", "nopc")
if length(flw) == length(fln) == 1
fw, fn = normpath(flw[1]), normpath(fln[1])
if !success(`cmp $fw $fn`)
println(file, ':')
run(ignorestatus(`diff -u --color $fw $fn`))
end
else
println("file ", file, " is present only in ", isempty(flw) ? "nopc" : "withpc")
end
end
```
yields a single relevant diff in [`Colors/parse.jl`](https://github.com/JuliaGraphics/Colors.jl/blob/master/src/parse.jl):
```diff
--- withpc/parse.jl.18420.mem 2019-12-09 02:37:18.800469670 -0600
+++ nopc/parse.jl.18505.mem 2019-12-09 02:39:45.171011466 -0600
@@ -59,7 +59,7 @@
0 if mat != nothing
0 prefix = mat.captures[1]
0 len = length(mat.captures[2])
- 1599856 digits = parse(UInt32, mat.captures[2], base=16)
+ 0 digits = parse(UInt32, mat.captures[2], base=16)
0 if len == 6
1600000 return convert(RGB{N0f8}, reinterpret(RGB24, digits))
0 elseif len == 3
```
It is not clear to me why this line should allocate memory in one case but not the other. One can also verify that commenting out the 3 precompile statements for `parse` from the `teh/precompile` branch eliminates the extra allocation.
```julia
julia> versioninfo()
Julia Version 1.3.0
Commit 46ce4d7933* (2019-11-26 06:09 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIAFUNCDIR = /home/tim/juliafunc
JULIA_CPU_THREADS = 4
```
opened 05:42AM - 22 Jun 21 UTC
help wanted
This is the follow-up to issue #425.
If precompilation is enabled (default), … some functions will be boxed, causing significant performance degradation. For example:
```julia
julia> using Colors, BenchmarkTools # Colors.jl v0.12.8
julia> rgb_f64 = rand(RGB{Float64}, 1000, 1000);
julia> @btime convert.(XYZ, $rgb_f64);
140.676 ms (5758522 allocations: 110.76 MiB)
```
This is a bug in Julia, although it has not been clearly identified, and several related issues have been reported.
- https://github.com/JuliaLang/julia/issues/34055
- https://github.com/JuliaLang/julia/issues/35537
- https://github.com/JuliaLang/julia/issues/35800
- https://github.com/JuliaLang/julia/issues/35972
"#35800" is often used as a synonym of this problem because of the memorability of the issue number.
I am not familiar with Julia's internals, so this looks very strange to me. However, I know a few things empirically.
Perhaps the root cause is that inferencing is context-dependent. A typical example of context-dependency is inlining. Inlining encourages constant propagation and makes inferences more likely to succeed.
We can't spend infinite time on the inference, so interrupting inferencing in a deep method chain is inevitable. The problem is that once a method fails to be inferred, it is marked as an "inference failure" until it is recompiled.
Perhaps Julia v1.6.x, the next LTS candidate, will not fully solve this problem. So we will have to live with this bug.
Of course, `__precompile__(false)` is a valid workaround. However, it is a last resort. There are two major approaches to workarounds. One is to make it easier for the inference to succeed (regardless of the context). The other is to avoid using "marked" (and likely to be "marked") functions.
I have implemented several workarounds in Colors v0.13 series, but the problems still remain.
**Edit:**
A new package JETTest.jl may have less side effects than the runtime-dispatch detection based on benchmarks (such as `@btime`).
https://discourse.julialang.org/t/ann-jettest-jl-advanced-testing-toolset-for-julia/63229
t-bltg
March 9, 2022, 9:35am
#5
I also see pretty weird allocations reports when using precompile statements (round
and log10
allocating).
We hit this in Speed by t-bltg · Pull Request #136 · JuliaPlots/PlotUtils.jl · GitHub .
@tim.holy might be interested in this precompilation scenario (which affects Plots.jl
).
1 Like
Answer: just use nightly or the upcoming 1.8.0-beta2, it’s fixed.
2 Likes