Very slow time to first plot, 2022

dlakelan · October 19, 2022, 4:38pm

Sure, so I recommend for a beginner to eliminate the global environment, then get used to local environments, then decide whether you want to add stuff to global environment after you understand how it all works.

But if a beginner has willy nilly put hundreds of packages into global environment, this is a good opportunity to start over I think.

mileage may vary, given that we’ve explained the thinking here, I think it’s up to the OP to decide which way to go.

lmiq · October 19, 2022, 4:43pm

It moves me to another environment. Most typically I’m talking about, for instance, having a function that computes something, and this is the goal of the current package/project. Then I want to do a quick plot to see if the result is the expected one. I don’t want to have to add a plotting package to the current environment, neither to a temporary one. Thus, I do keep even Plots in my global shared environment, in such a way that I can plot stuff from wherever I am, and the same for benchmarking tools, etc.

The real issue is really having the main environment bloated AND developing stuff on it. In this case any new package added can cause a cascade of dependency updates and trigger a lot of precompilations, that can be quite annoying. But if the bloated main environment is just used as an eventual dirty-package repository, and the current project is kept clean, that does not occur frequently.

rongcuid · October 19, 2022, 5:28pm

Projects is a different thing. Of course I can use it, but that doesn’t change the fact that Makie takes half a minute to import, without extraneous dependencies. I don’t know Julia internals well enough, but I don’t get why precompiled packages are still so slow to load. Aren’t they supposed to be ready to execute immediately?

_bernhard · October 19, 2022, 5:36pm

Unfortunately that’s not how it works (yet), hence PREcompiled. To actually get what you are looking for (ready to execute) you need to resort to sys images (for the time being).

However, there is a lot of work being done regarding saving compiled code to disk, so some time in the not too distant future things may get better.

Krastanov · October 19, 2022, 5:43pm

I think you are still mixing up two different “slowdown” issues. Importing CairoMakie on my system takes 13 seconds, which is indeed too long but (a) there is a workaround, (b) it will be less in the next few releases of Julia and (c) it is much faster than what you are reporting so there is some other issue on your end that we should try to address first.

Could you try the following:

Make a new Project (venv in Python-speak) and ] add CairoMakie in it. That would cause download and compilation that can take more than a minute. On my computer this takes 3min30sec. Is that similar to your experience? This however should happen very rarely: when you start a project and when you update libraries installed in the project.
Open that project again and @time @eval using CairoMakie. That was 14sec for my computer. It is much faster than the install step because a lot of code was indeed precompiled and cached. However, because the julia compiler can not know all the possible combinations of types on which code will be called, it can not precompile and cache everything. In previous post here I linked to issues to track on github about how the compiler is being made smarter in order for this caching to be better.
In the meantime, there is a solution: with a sysimage using CairoMakie takes milliseconds and plotting the first plot takes 2 seconds (it is very fast after that).

I believe you were conflating the initial installation+compilation step that should be happening very rarely with a well organized Project file and the using ... step that happens often but does not involve long package precompilation unless the Project environment has been modified. That latter step is indeed also frustratingly slow but (a) nowhere near as slow as what you are experiencing and (b) being fixed both upstream in future julia releases and already today with custom sysimages.

Could you let me know which of the above steps do not match your experience? We can try to figure out where to go from there.

rongcuid · October 19, 2022, 5:50pm

Time adding CairoMakie:

julia> @time Pkg.add("CairoMakie")
# ... omit ...
  5 dependencies successfully precompiled in 143 seconds. 171 already precompiled.
147.583926 seconds (4.95 M allocations: 375.967 MiB, 0.14% gc time, 0.55% compilation time)

I know precompile happens only once (unless I touch packages).

Time to using CairoMakie in same session:

julia> @time @eval using CairoMakie
 19.565392 seconds (48.45 M allocations: 2.985 GiB, 6.26% gc time, 6.46% compilation time: 79% of which was recompilation)

Time for new session:

julia> @time @eval using CairoMakie
 19.041467 seconds (48.45 M allocations: 2.987 GiB, 6.31% gc time, 7.16% compilation time: 80% of which was recompilation)

Sorry I did not make it clear. The 2-3 minutes import happen when I import CairoMakie and a bunch of other packages, which I use for data analysis, IO, etc.

Krastanov · October 19, 2022, 6:01pm

Ok, these results you have I believe would match the experience of most people here (maybe some things would be up to two times faster because of beefier desktop computers, but that is not excuse). (Edit: the “2-3 minute” quote in your end paragraph is still weird, but I am focusing on the measures that are explicitly reported in this post. If you can reproduce the “2-3 minute” delays with a new Project file, please share so we can investigate)

With stock julia 1.8 and a package as ridiculously dynamic as Makie you can not do better than the 10-to-20sec you just measured. However, since julia 1.8 it is possible to cache much of this in a sysimage. This is the only way to get millisecond-fast Makie import today. It was actually made possible by improvements to the caching of code in 1.8 and it was not possible in 1.7. Tim Holy is among the heros that made it possible.

In juia 1.9 and more plausibly 1.10, sysimages might not be necessary, thanks to smarter compiled code caching schemes being developed.

There are a couple of good ways to make a sysimage in 1.8. My preferred way is to just make a default (no customizations) sysimage command in VS code. It usually makes good enough sysimages. Other folks prefer AutoSysimage.jl. I completely agree that it is frustrating to have to add these extra steps, but on the other hand, compilation steps like that are normal for compiled languages like Rust and C. Julia is trying to hit a very difficult middle ground between dynamic and compiled-fast, so I am for the moment content with this extra step. As I mentioned, in the next few versions of julia it might not be necessary.

A tangent: I believe it is worthwhile to discuss why this is such a phenomenally big problem in julia. Julia has two very special features other languages do not share: (1) multimethods as the fundamental principle for the entirety of the ecosystem and (2) compiled code. It is very difficult to know what code you need compiled and to not discard the vast majority of already compiled code when importing new libraries that add new methods for pre-existing functions. No one has had to deal with this problem before julia. It is being slowly dealt with. Sysimages basically carry the promise that no significant amount of new methods will be defined, hence they can cache more compiled code (this is very oversimplified borderline misleading explanation).

Edit: I have an experimental simulator for some quantum physics project. It uses DiffEq, QuantumOptics, QuantumClifford, SimJulia, GLMakie, and a bunch of other libraries. Similarly to you I had more than a minute for first-time import even with a stable Project.toml. Before 1.8 I had to be very careful with using Revise and long running julia processes and let blocks to keep my workspace clean of temporary variables. Now I use a sysimage in vscode and import times are less than a second. I had to modify my workflow but I am incredibly productive with this new workflow.

sdanisch · October 19, 2022, 6:09pm

You mean, AFTER precompile or it precompiles because you freshly added those other packages?
“importing” suggest the first, which would be a serious problem we should investigate

rongcuid · October 19, 2022, 6:38pm

After precompile. Let me try to run my big analytic script somewhere else to see what I get…

Does Makie use a lot of metaprogramming or what else makes it so challenging?

Now, I was thinking about something like Java Hotspot, where you start by interpreting and transition to JIT compiled code… of course, this would be a lot of work.

mkoculak · October 19, 2022, 6:45pm

This is a bit of a tangent, but can someone explain why there is so much allocations reported when importing a library in a session?
CairoMakie has 50M allocations with 3 GiB of stuff moved in memory.

I checked my small package I am working on now, ~800 lines of code, 33KB of size and still have 800k allocations and 42 MiB. Is this the footprint of inner working of (pre-)compilation?

rongcuid · October 19, 2022, 6:47pm

I will break down some (non-confidential) parts.

The import part is actually not as long as I thought:

@time @eval begin
       using DataFrames
       using Query
       using CSV
       using JSON3
       using CodecZlib

       using CairoMakie
       end
 26.104101 seconds (62.05 M allocations: 3.762 GiB, 5.58% gc time, 17.67% compilation time: 88% of which was recompilation)

I have this load function:

load(path) = JSON3.read(transcode(GzipDecompressor, read(path)), jsonlines=true) |> @map(omitted...) |> DataFrame

Which takes some time:

@time @eval DF = load("data/data.json.gz")
  6.174465 seconds (17.61 M allocations: 1.051 GiB, 6.85% gc time, 95.87% compilation time)

Second run:

@time @eval DF = load("data/data.json.gz")
  0.222467 seconds (489.21 k allocations: 96.621 MiB, 10.96% gc time)

And also:

@time @eval Tne = CSV.read("data/data.csv", DataFrame)
  8.775861 seconds (20.21 M allocations: 1.073 GiB, 2.38% gc time, 99.98% compilation time)

I didn’t touch the rest. Now you see not only import takes a few dozen seconds, but first call to every function to those imports take a few seconds first. In some of my more complicated analytic script, when I ensure they reproduce any result (and thus must run fresh), it takes ~10 minutes to produce all the plots. For data no bigger than 16MB!

rongcuid · October 19, 2022, 7:00pm

Let me give a more reproducible example:

julia> using PalmerPenguins
julia> @time @eval using DataFrames, CairoMakie, AlgebraOfGraphics
 22.308379 seconds (55.87 M allocations: 3.419 GiB, 6.10% gc time, 8.79% compilation time: 87% of which was recompilation)
julia> penguins = dropmissing(DataFrame(PalmerPenguins.load()))
julia> @time @eval set_aog_theme!()
  0.829849 seconds (1.46 M allocations: 78.105 MiB, 5.26% gc time, 99.54% compilation time)
julia> @time @eval axis = (width = 225, height = 225)
  0.006352 seconds (1.92 k allocations: 109.791 KiB, 50.45% compilation time)
(width = 225, height = 225)
julia> @time @eval penguin_frequency = data(penguins) * frequency() * mapping(:species)
  0.289407 seconds (745.40 k allocations: 38.585 MiB, 99.36% compilation time)
julia> @time @eval draw(penguin_frequency; axis)
 32.939791 seconds (81.10 M allocations: 4.402 GiB, 2.79% gc time, 98.47% compilation time: 15% of which was recompilation)

So… a lot of compilation is happening on first function call, not on using.

Krastanov · October 19, 2022, 7:20pm

Yes, this seems quite a bit more realistic. Your options are:

custom sysimages
waiting for the code caching work in future julia versions (1.10+ probably)
not employing a workflow that uses short scripts
switching to python if you can not use sysimages and script latency is more important than bulk performance or other julia advantages

For all of these, using Project.toml/venv is a good idea.

rongcuid · October 19, 2022, 7:58pm

I compared asysimg with julia using my research analysis script:

asysimg> @time @eval include("scripts/plot.jl")
 34.292645 seconds (96.27 M allocations: 5.281 GiB, 6.67% gc time, 93.24% compilation time)
CairoMakie.Screen{PDF}

julia> @time @eval include("scripts/plot.jl")
 85.077615 seconds (190.55 M allocations: 10.829 GiB, 3.77% gc time, 70.55% compilation time: 17% of which was recompilation)
CairoMakie.Screen{PDF}

Better, but not exactly fast. As you see, even Sysimage spends 93% time compiling.

fatteneder · October 19, 2022, 8:09pm

Please apologize if it has been already answered and I missed it.

Could you elaborate a bit on why dead variables are a problem for you?
Is it that you need to load so much data that they eat up all your RAM?

Calling them “dead variables” gives me the impression that one could reclaim and reuse their memory (assuming memory is really the issue here).

rongcuid · October 19, 2022, 8:12pm

Dead variables are just a workflow problem. During research, one cannot possibly know what kind of code structure they end up with, and there is a lot of ad-hoc visualization via VSCode cells, etc because everything is still unknown.

At some point, loose variable float around and become hazardous due to stale data, etc. “Cleaning up” would ideally remove all the globals without touching the imported packages so I don’t have to pay minutes for the new julia session to compile all those functions.

The issue has nothing to do with dead variables taking memory.

fatteneder · October 19, 2022, 8:28pm

Ok, now I see what you mean.
I guess you already know, but that issue is not exclusive to Julia and also happens e.g. with Jupyter notebooks in python. I have to admit this is the main reason I dislike notebooks too

Just my two cents on how I circumvent the above (I don’t use VSCode, just a vanilla REPL):

Startup (which includes loading Revise.jl)
Explore some data
After a while I see a bunch of data I often either recompute or load again. I extract that bit into a script and reinclude it (or add a caching mechanism to reload it). If needed, I go back to 1.
Go to 2. and repeat.

I am sure people already told you about that kind of workflow. But since it hasn’t been mentioned yet, I thought I bring it up.

I remember also trying to do everything with scripts and fresh Julia session for some time, because I knew it also works this way with python. Until I realized that not everything needs to behave like python and I gave the above workflow a try.

rongcuid · October 19, 2022, 8:32pm

Yes, I either use this workflow or I use Pluto. And this is why I ask this topic… unlike Python, Julia’s reload is very slow.

Krastanov · October 19, 2022, 9:10pm

Could you share a self contained example, maybe the way you did with the PalmerPenguins? For instance, a 20 line gist on github that imports the libraries and the fake data and makes a couple of simple plots? I am very surprised by the slowness of the sysimage you have. I imagine a lot of folks here would take it as a point of pride to figure out why the load is so slow (I am not promising that I will have the time to debug this in the next couple of days).

aramirezreyes · October 19, 2022, 9:58pm

Something to help understand is that precompilation does not mean “ahead-of-time compilation”, but “step before compilation” in this context. Like simon explained in this old post.

So the compilation is still done on first execution. But I agree it is still frustrating. I think I got used to the ~1 minute before being able to do something useful. I go and prepare some coffee before. I am glad that storing compiled code is in the horizon!

Topic		Replies	Views
Makie - SLOW Code: ax = Axis(fig[1,1]) takes 50 seconds to execute Visualization makie	19	1508	November 16, 2021
10-15 minute TTFP with Plots.jl... Please help New to Julia ttfp	55	2958	January 9, 2023
Taking TTFX seriously: Can we make common packages faster to load and use Performance ttfp	125	11889	June 20, 2022
Ways to make slow/sluggish REPL/interactive development experience faster? Performance repl , ttfp	35	5627	July 23, 2019
Roadmap for a faster time-to-first-plot? Internals & Design ttfp	251	32580	August 3, 2021

Very slow time to first plot, 2022

Related topics