10-15 minute TTFP with Plots.jl... Please help

Maybe you could just use an other plotting package. PyPlot requires almost no compilation since it’s in Python. Personally that’s what I use, since I also thought the precompillation every time I changed environment was too long.

3 Likes

PackageCompiler on 1.9 is likely slow now due to some reasons (which will be fixed). On 1.8 though:

julia> @time PackageCompiler.create_sysimage(["Plots"]; sysimage_path="sys_plots.dylib")
✔ [01m:50s] PackageCompiler: compiling incremental system image
112.827478 seconds (1.03 M allocations: 57.058 MiB, 0.01% gc time, 0.13% compilation time)

2 Likes

I might :broken_heart:. But this solution doesn’t generalize because there are other packages with precompile times that are too long to wait through daily or multiple times a day which do not have ready alternatives in other languages (e.g. Zygote/Flux)


I’ve added the following to my .julia/config/startup.jl file so that I can call _add("Plots"), wait for the precompilation, and then have a quick using Plots in any environment. Unlike adding Plots to my global environment, Plots will not reprecompile when it or any of its dependencies is updated unless I explicitly invoke _add("Plots") or _up("Plots"). When I switch Julia versions, it will reprecompile, but only once per julia version per call to _add or _up. Unlike the PackageCompiler route, with this approach the reprecompilation time on julia updates is lower and requires no manual intervention whatsoever. Normal Pkg usage should be entirely unaffected by this patch.

The implementation is mostly untested and the UX is terrible:
# https://discourse.julialang.org/t/10-15-minute-ttfp-with-plots-jl-please-help/92636/23?u=lilith
if isinteractive()
    import Pkg

    function _add(pkg::AbstractString)
        project = dirname(Base.active_project())
        Pkg.activate(pkg, shared=true)
        Pkg.add(pkg)
        Pkg.activate(project)
        pkg ∈ KNOWN_PACKAGES && return
        push!(KNOWN_PACKAGES, Symbol(pkg))
        open(joinpath(@__DIR__, "known_packages"), "a") do io
            println(io, pkg)
        end
    end

    function _up(pkg::AbstractString)
        project = dirname(Base.active_project())
        Pkg.activate(pkg, shared=true)
        Pkg.update()
        Pkg.activate(project)
    end

    function _rm(pkg::AbstractString)
        filter!(≠(pkg), KNOWN_PACKAGES)
        open(joinpath(@__DIR__, "known_packages"), "w") do io
            for pkg in KNOWN_PACKAGES
                println(io, pkg)
            end
        end
    end

    isfile(joinpath(@__DIR__, "known_packages")) || touch(joinpath(@__DIR__, "known_packages"))
    const KNOWN_PACKAGES = Symbol.(readlines(joinpath(@__DIR__, "known_packages")))

    pushfirst!(Pkg.REPL.install_packages_hooks, symbols -> begin
        isdisjoint(symbols, KNOWN_PACKAGES) && return false
        if symbols ⊆ KNOWN_PACKAGES
            project = dirname(Base.active_project())
            for s in symbols # Install known packages
                Pkg.activate(string(s), shared=true)
                @eval using $s
            end
            Pkg.activate(project)
            @info "Successfully loaded $(join(symbols, ", ")). Please ignore the following error message:"
            true
        else
            @warn "$(intersection(symbols, KNOWN_PACKAGES)) are special packages that cannot be installed at the same time as ordinary packages $(setdiff(symbols, KNOWN_PACKAGES))."
            false
        end
    end)
end

The tradeoff is that I may, in theory, be exposed to known package version incompatibilities between the plotting stack and whatever else I may be working with. OTOH I think I get the same risk with PackageCompiler and updating the sysimg is harder than a call to _up("Plots")

Lovely! That was the problem. In 1.8 I got a sysimg made in 385 seconds which is in line with the ratio I’ve been seeing in this thread w.r.t. my timings vs other folks’. I’m still going to try the idea in this post first for the aforementioned reasons, though)

1 Like

Sorry to hear this is a pain. If Julia’s ecosystem is mature enough that you can use Julia for several weeks without updating packages, then the two best options might be:

  • use Julia 1.9 with --preserve=all when you need to add or remove packages
  • use 1.8 with PackageCompiler
3 Likes

For the most part, the ecosystem is mature enough for that. I do want to explore new combinations of packages much more frequently than that, though. For your first option, I see a few problems in decreasing order of severity

  • IIUC --preserve=all has no effect when it is possible to add a package without reprecompilation and turns cases that would have triggered reprecompilation into compatibility errors whose solution is running without --preserve=all. Hopefully I’m missing something, but I fail to see the benefit here in any case whatsoever.

  • Today I want to plot the performance of a neural network. This is in a new project that has all the latest versions of all its dependancies. Suppose that some of these dependencies are shared with Plots and some of those are newer than the versions that I had when I precompiled Plots yesterday. I see no way to avoid reprecompiling plots today using the traditional Pkg.jl workflow.

  • Yesterday, I wanted to plot some performance data I got from running SortingAlgorithms.jl with Julia 1.9. I didn’t ]add Plots to SortingAlgorithms.jl’s Project.toml, though, because it is a lightweight registered package that other folks want to use and I don’t want to have to keep my local clone out of sync with the public version. If I want to plot again in the same project without any interleaving Pkg operations without reprecompiling today, I’d have to have added it to the Project.toml.

3 Likes

My advice which I’ve learned the hard way:

Use separate environments for every new project. Having one global environment with every package you ever want to use is indeed slow and painful. Precompiling small, separate projects is much quicker.

2 Likes

@Jollywatt, usually this is a good idea, but it does not address the problem that Lilith has. If anything, it just makes it worse: every single environment will have its own Plot module that will need to be precompiled. And in a workflow in which someone experiments with dev packages, installs and deletes dependencies, etc, the Plots package will also get precompiled many times.

To an extent, the problem is that Lilith is doing heavy development work (which inherently leads to a lot of precompilation), so the typical user solutions (which minimize load times and TTFX at the expense of precompilation time) do not work.

Throwing another suggestion in: would it make sense to have one single Plot environment and whenever you want to plot something from another environment, just serialize it in the “work” project and deserialize it in the Plot project. It is not ergonomic at all, but I have the impression that there is simply no ergonomic way to do heavy dev work with julia on weak machines (which is different from doing heavy science or homework work).

2 Likes

Maybe I am wrong (and someone more knowledgable will correct me), but compilation happens on the version-level and not env-level (at least in my .julia/compiled/ folder I have everything grouped in 1.8 and 1.9 subfolders). This follows my experience, that Pkg only recompiles stuff in case some new context appears (new version of a package or some compatibility with other packages) and not when I create a new env.
Still, having a small env with only the necessary packages limits the number of these changes of context in a time period.

Have a browse on the compiled packages folders and see the number of different .ji or .dll files in there (on a package level). The case is potentially explosive with the precompiled caches of 1.9 where shared libs can go above > 50 MB (or well above that) and that are apparently multiplicated by the number of environments. Tell people to always use environments (which I don’t) and not warning them about this does not look right to me.

3 Likes

Right, it seems that it does have a set of files per env.
My workflow is mostly doing same thing for a long time, so I have only 2/3 copies, but can imagine, that one could generate a lot of these.
But on the other hand there is the conda/pip system, where compiled code for each version of package is stored, which also can grow quickly when a packages updates frequently. Not sure if there is a way around it (I remember that my conda envs also tended to be ~1GB after installing scipy/numpy/pandas and couple more packages, so it seems a similar story here).

That is even worst. I don’t use it but have one for some experiences and when I distract it grows to ~10 GB (once it was 18) without knowing why/how. Absolutely ridiculous for something that calls itself miniconda. But I think we should tr to not follow that example.

There is an (internal) setting to chose how many cache files each package can have before old ones are deleted:

You can lower that one and you will use less disk space and trade it for more time compiling.

3 Likes

Thanks, that will probably be useful for some people in this thread.

But these are different versions of a package, right?
E.g. I have Plots v1.37.1, v1.38.0, and v1.38.1 - therefore three sets of precompiled code are in the Plots folder. It is not the case that it is three times v1.38.1 for each of my envs that has Plots in it?

Each specific version should be precompiled only once, no matter in how many environments it is used. On 1.9 there might be different artifacts if you use a package twice, but with different options of the Julia compiler, not sure about this, though…

One topic that was not discussed here yet is, why do you have so long pre-compile times?
Or perhaps the download times are long because you have a slow internet connection?
Are you using Windows? On Windows virus scanners and firewalls can slow things down.
Usually slow laptops work better with Linux than with Windows…

2 Likes

It looks like the original “10-15min” included time to precompile the environment. I experience similar waits — I think it might be typical! (For large projects.)

Another tip: You can safely abort environment precompilation with Ctrl+C in Pkg mode. Especially useful if just one stubborn package is left and you’re impatient.

Well, if you look at the provided benchmark code you can see that an empty, temporary environment is activated and ONLY Plots.jl is precompiled. For this testcase 10-15 min is definitely NOT typical, not even on a dual core CPU with 8GB RAM.

On my laptop I get after renaming my .julia folder :

Precompiling project...
  134 dependencies successfully precompiled in 69 seconds
Julia Version 1.8.4
Commit 00177ebc4fc (2022-12-23 21:32 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores
113.112786 seconds (19.89 M allocations: 1.987 GiB, 0.79% gc time, 6.81% compilation time: 0% of which was recompilation)

69s precompilation, 44s dowload time.

@Lilith wrote:

I’m using a 2019 1.6 GHz Dual-Core Intel Core i5 processor with 8GB of ram and an Intel UHD Graphics 617 1536 MB embedded GPU.

According to CPUs launched during year 2019 the only dual core CPU vo Intel launched in 2019 was an Intel Pentium Gold. The smallest i5 CPU has 6 cores.

1 Like

Laptops exist. Specifically the i5-8210Y is a dual core laptop cpu

2 Likes

But the performance of that CPU might be 1.5 times slower than mine (UserBenchmark: Intel Core i5-8210Y vs i7-10510U), but not 5-7 times slower. So there is something else going on here. Perhaps the laptop was in battery saver mode? Or vscode was running and eating CPU cycles? Or a spinning hard drive used?

User benchmark is hilariously bad, but the bigger thing is probably that on a dual core chip all the background nonsense matters a lot more. Slack eating 2/3rds of a core for no reason? There goes 1/3rd of your performance. Windows downloading an update? There goes another third. Also a low power chip like this is fairly likely to only get 1 channel of it’s ram filled in (saves power and cost) which can slow things down.

6 Likes