[ANN] SnoopPrecompile -> PrecompileTools

SnoopPrecompile is being deprecated in favor of PrecompileTools. If you maintain a General-registry package that uses SnoopPrecompile, you should soon receive a pull request that will migrate you to PrecompileTools. Other than merging it, no further action on your part should be needed. People who have private repositories that use SnoopPrecompile will have to migrate manually, see script below.

PrecompileTools is nearly a drop-in replacement except that there are changes in naming and how developers locally disable precompilation (to make their development workflow more efficient). These changes are described in PrecompileToolā€™s enhanced documentation, which also includes instructions for users on how to set up custom ā€œStartupā€ packages, handling precompilation tasks that are not amenable to workloads, and tips for troubleshooting.

Why the new package? It meets several goals:

  • The name ā€œSnoopPrecompileā€ was easily confused with ā€œSnoopCompile,ā€ a package designed for analyzing rather than enacting precompilation.

  • SnoopPrecompile/PrecompileTools has become (directly or indirectly) a dependency for much of the Julia ecosystem, a trend that seems likely to grow with time. It makes sense to host it in a more central location than one developerā€™s personal account.

  • As Juliaā€™s own stdlibs migrate to become independently updateable (true for DelimitedFiles in Julia 1.9, with others anticipated for Julia 1.10), several of them would like to use PrecompileTools for high-quality precompilation. That requires making PrecompileTools its own ā€œupgradable stdlib.ā€

  • We wanted to change the use of Preferences to make packages more independent of one another. Since this would have been a breaking change, it seemed like a good opportunity to fix other issues, too.

If you need to migrate manually, this function may help:

function convert2pct(dir::AbstractString)
    projfile = joinpath(dir, "Project.toml")
    str = read(projfile, String)
    str = replace(str, "SnoopPrecompile" => "PrecompileTools")
    str = replace(str, "66db9d55-30c0-4569-8b51-7e840670fc0c" => "aea7be01-6a6a-4083-8856-8a6e6704d82a")
    open(projfile, "w") do io
        write(io, str)
    end
    dirs = [joinpath(dir, "src")]
    while !isempty(dirs)
        d = pop!(dirs)
        for f in readdir(d)
            f = joinpath(d, f)
            if isdir(f)
                push!(dirs, f)
            elseif endswith(f, ".jl")
                str0 = read(f, String)
                str = replace(str0, "SnoopPrecompile" => "PrecompileTools",
                                    "@precompile_all_calls" => "@compile_workload",
                                    "@precompile_setup" => "@setup_workload")
                if str != str0
                    open(f, "w") do io
                        write(io, str)
                    end
                end
            end
        end
    end
end
64 Likes

This is very helpful!

I have a side question about using PrecompileTools. I notice that while it works smoothly with some methods in my package, for some other methods, although it reduces the compile time, for the compiled method it also adds more allocations (2 in my case, more details to be filled in later if needed). A quick observation is that this method contains certain amount of type instabilities, which may cause invalidations. Should we expect all methods to be identical performance-wise with/without using PrecompileTools?

1 Like

Needed! :slight_smile:

I donā€™t know exactly whatā€™s happening. I worry that there is something being omitted from the cache, similar to https://github.com/JuliaLang/julia/issues/35972 (which I thought we had fixed). This is the first such report Iā€™ve seen, so more details would be greatly appreciated.

3 Likes

Here are some more details. I have a registered package Vlasiator.jl which provides file reading capabilities for a certain format. Since the types of stored variables are saved in metadata and are only known when actually reading the file, there are type instabilities in the method readvariable.

Currently on the master branch, if I donā€™t include readvariable in the precompilation workflow:

@setup_workload begin
   initfile = joinpath(@__DIR__, "../test/init.vlsv")
   @compile_workload begin
      meta = load(initfile)
   end
end

then I have identical compiled methods allocations compared with not using PrecompileTools:

julia> using Vlasiator
[ Info: Precompiling Vlasiator [7d2ba682-ad6e-4e20-80d9-3f2d4a610bb4]

julia> file = "bulk.2d.vlsv";

julia> @time meta=load(file);
  0.004910 seconds (379 allocations: 377.492 KiB, 80.44% compilation time)

julia> @time meta=load(file);
  0.000743 seconds (360 allocations: 376.414 KiB)

julia> @time cid=readvariable(meta, "CellID");
  0.040152 seconds (28.59 k allocations: 1.978 MiB, 99.66% compilation time)

julia> @time cid=readvariable(meta, "CellID");
  0.000072 seconds (15 allocations: 98.844 KiB)

readvariable has 15 allocations. However, if I include it in the precompilation workflow:

@setup_workload begin
   initfile = joinpath(@__DIR__, "../test/init.vlsv")
   @compile_workload begin
      meta = load(initfile)
      cid = readvariable(meta, "CellID")
   end
end

then

julia> using Vlasiator
[ Info: Precompiling Vlasiator [7d2ba682-ad6e-4e20-80d9-3f2d4a610bb4]

julia> file = "test/data/bulk.2d.vlsv";

julia> @time meta=load(file);
  0.004590 seconds (379 allocations: 377.492 KiB, 80.49% compilation time)

julia> @time meta=load(file);
  0.000810 seconds (360 allocations: 376.414 KiB)

julia> @time cid=readvariable(meta, "CellID");
  0.000115 seconds (29 allocations: 99.641 KiB)

julia> @time cid=readvariable(meta, "CellID");
  0.000103 seconds (17 allocations: 98.984 KiB)

There are now 17 allocations.

Some test files can be found here.

1 Like

Which Julia version is this? Looks like 1.8

This is 1.9.0-rc2.

1 Like

I use Aqua.jl as a sanity check in some of my packages, and this PR caused it to fail. Just a heads-up for Aqua users:

     Testing Running tests...
/home/miguel/rcs/jdev/SinusoidalRegressions/Project.toml: Test Failed at /home/miguel/.julia/packages/Aqua/utObL/src/project_toml_formatting.jl:7
  Expression: result āŠœ true
   Evaluated: āŸŖresult: šŸ˜­ FAILED: /home/miguel/rcs/jdev/SinusoidalRegressions/Project.toml
    Running `Pkg.resolve` on `/home/miguel/rcs/jdev/SinusoidalRegressions/Project.toml` will change the content.

    --- Original Project.toml
    +++ Pkg's output
    @@ -6,14 +6,14 @@
     [deps]
     LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
     LsqFit = "2fda8390-95c7-5789-9bda-21331edee243"
    -RecipesBase = "3cdcf5f2-1ef4-517c-9805-6587b60abb01"
     PrecompileTools = "aea7be01-6a6a-4083-8856-8a6e6704d82a"
    +RecipesBase = "3cdcf5f2-1ef4-517c-9805-6587b60abb01"
     Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"

     [compat]
     LsqFit = "0.13"
    -RecipesBase = "1"
     PrecompileTools = "1"
    +RecipesBase = "1"
     julia = "1.8"

     [extras]

āŸ« āŠœ true

Apparently Aqua wants the deps and compat entries to be in alphabetical orderā€¦ Just moving those lines around fixed the issue.

3 Likes

Awesome update, thanks for the PRs @tim.holy!


Aside: the fact that you were able to automatically submit PRs to update the API in every dependent package at once, I think demonstrates how robust Juliaā€™s package management infrastructure is. It reminds me of working within Googleā€™s monorepo (see Why Google Stores Billions of Lines of Code in a Single Repository ā€“ Google Research) and how library maintainers could update all uses of their API across the organization, so library users never have to spend time updating deprecated calls themselves.

Aside 2: related, crazy idea, but it would be cool if there could be a way to have CompatHelper.jl do simple API updates automatically (e.g., changed function names), perhaps by a user committing a list of regexps each time they update their package in the central registry.

I thought it was still a ā€œknown issueā€ (albeit less frequent), so I havenā€™t bothered reporting examples. TriangularSolve.jl and VectorizedRNG.jl both reliably had this issue. Iā€™ve removed their precompile statements for this reason.
VectorizedRNG was just this past week, but it wasnā€™t using SnoopCompile.

TriangularSolve exhibited this + other precompilation problems for a long time, but I was slow to finally disable it, in January of this year: Merge pull request #29 from JuliaSIMD/remove-precompile Ā· JuliaSIMD/TriangularSolve.jl@251075c Ā· GitHub

2 Likes

Thereā€™s no hurry to bump a patch release after merging this, right?

1 Like

Can someone please point me to a doc explaining how @tim.holy manage to automatically submit PRs to update dependent package all at once ? Thank you

1 Like

One (naive?) way to proceed would be to explore & process Juliaā€™s general registry (https://github.com/JuliaRegistries/General).

In it, youā€™ll find all the necessary information for each package : which one has SnoopPrecompile in its dependencies, and its repoā€™s URL.

That, plus fiddling a bit with git and youā€™re good to go.

Registry example for Plots.jl

There is GitHub.jl for interacting with githubs api, e.g. to fork, clone, push, create a pull request and so on, so that could probably be used if you want to write a small script in julia to do this. Together with some processing of the general registry (I think there could maybe be some tools for that in e.g. Pkg.Registry?) to find dependants and the script convert2pct that tim provided in OP, it should be possible to put something together to automate this.

1 Like

It would be nice to have a function there such that package authors can create PRs to dependencies in the way CompatHelper does, whenever they feel is important (for instance, when non-breaking 1.0 releases appear).

Not really. In a month or so I will probably add a deprecation warning that will fire in an __init__ method for SnoopPrecompile, but Iā€™m flexible about the timing with which I add this.

3 Likes

Thanks @mbaz. This was the only consistent form of failure with this PR, but it was reasonably common. Sorry I didnā€™t figure this out as a potential problem before submitting the PRs.

1 Like

For those who wanted to know the details of how I generated the pull requests, here it is. Itā€™s largely copy/pasted from MassInstallActions with as few changes as I had to make to get this to work. As the name MassInstallActions suggests, that package only handles pull request dealing with GitHub Actions. Ideally weā€™d refactor that package to support more general kinds of changes, but for now I just hacked the following up:

using GitHub, HTTP, Pkg

include("convertpc.jl")

const default_body = read("commitmsg.md", String)

function with_temp_dir(f::Function)
    original_directory = pwd()
    tmp_dir = mktempdir()
    atexit(() -> rm(tmp_dir; force = true, recursive = true))
    cd(tmp_dir)
    result = f(tmp_dir)
    cd(original_directory)
    rm(tmp_dir; force = true, recursive = true)
    return result
end

function git(f)
    return f("git")
end

function migrate(repo::GitHub.Repo;
                 auth::GitHub.Authorization,
                 pr_branch_name::AbstractString = "teh/precompiletools",
                 pr_title::AbstractString = "Migrate from SnoopPrecompile to PrecompileTools",
                 pr_body::AbstractString = default_body,
                 commit_message::AbstractString = "Migrate from SnoopPrecompile to PrecompileTools",
                 pkg_url_type::Symbol = :html)
    fk = GitHub.create_fork(repo; auth)
    if pkg_url_type === :html
        pkg_url_with_auth = fk.html_url.uri
    elseif pkg_url_type === :ssh
        pkg_url_with_auth = fk.ssh_url.uri
    else
        throw(ArgumentError("`pkg_url_type = $(pkg_url_type)` not supported"))
    end
    sleep(5)
    with_temp_dir() do tmp_dir
        git() do git
            cd(tmp_dir) do
                run(`$(git) clone $(pkg_url_with_auth) REPO`)
                cd("REPO")
                run(`$(git) checkout -B $(pr_branch_name)`)
                if convert2pct(joinpath(tmp_dir, "REPO"))
                    run(`$(git) add -A`)
                    run(`$(git) commit -m $(commit_message)`)
                    # try
                        run(`$(git) push --force origin $(pr_branch_name)`)
                    # catch
                    #     # try again?
                    #     run(`$(git) push --force origin $(pr_branch_name)`)
                    # end
                    sleep(5)
                    params = Dict{String, String}()
                    params["title"] = pr_title
                    params["head"] = "timholy:" * pr_branch_name
                    params["base"] = repo.default_branch
                    params["body"] = pr_body
                    GitHub.create_pull_request(repo; params, auth)
                    @info "Pull request submitted for $(repo.name)"
                end
            end
        end
    end
    return nothing
end

regs = Pkg.Registry.reachable_registries()
reg = only(filter(r -> r.name == "General", regs))
regpath = splitext(reg.path)[1]
toml = Pkg.TOML.parsefile(joinpath(regpath, "Registry.toml"))
pkgurls = String[]
for (uuid, data) in toml["packages"]
    pkgpath = data["path"]
    depfile = joinpath(regpath, pkgpath, "Deps.toml")
    if isfile(depfile)
        deps = read(depfile, String)
        if occursin("SnoopPrecompile", deps)
            pkgfile = joinpath(regpath, pkgpath, "Package.toml")
            pkginfo = Pkg.TOML.parsefile(pkgfile)
            url = splitext(pkginfo["repo"])[1]
            push!(pkgurls, url)
        end
    end
end
sort!(pkgurls; by=name->splitpath(name)[end])
unique!(pkgurls)

# A chance to fix errors/edit the list. This was needed because the
# script didn't initially run to completion and I had to remove the ones already tackled
error("edit `pkgurls` to narrow the list of packages, then run the block below")

while !isempty(pkgurls)
    url = pkgurls[end]
    println(url)
    repo = GitHub.repo(joinpath(splitpath(url)[3:end]...); auth)
    migrate(repo; auth, pkg_url_type=:ssh)
    pop!(pkgurls)   # worked, delete from queue
    println("\n\n")
end

convertpc.jl is the script I posted in the OP, and commitmsg.md was the message many of you received with the pull requests. auth is a GitHub authentication token that you have to set up externally.

12 Likes

I guess the issue I reported here has nothing to do with PrecompileTools, but with Julia itself, right? Do I need to create an issue somewhere, and which is the right repo to report this?

Correct, itā€™s a Julia issue. Please report it to the Julia repo.

By the way, there is a method to increase type stability in the case of IO. If you provide the type you expect to read in, then you can stabilize your function.

This suggests dividing the read operation of an unknown sample into two parts. First, detect the type to read. Then, do the reading or parsing. Basically, pretend that Julia is a statically typed language.

3 Likes