Very slow time to first plot, 2022

I’ll try to make a minimal example. Do you think I should open an issue on julia?

Interesting. I thought we have at least the IR when precompilation is done.

Yeah, but for some package LLVM time is significant.

You can’t make IR unless you know the types and you can’t in general know the types until the function is called. So type inference and IR must be done at call time.

What I think is being worked on is caching the machine code for cases that are handed to the precompiler. So if you expect your function to be called on an array of Float64 a lot, it will actually do the type inference, IR and machine code for that case and cache it for later. Right now to get that done you need to make a sysimage I think

I have a minimal example here! In fact, it is exactly the Penguin example:

using PalmerPenguins
using DataFrames, CairoMakie, AlgebraOfGraphics
set_aog_theme!()
penguins = dropmissing(DataFrame(PalmerPenguins.load()))
axis = (width = 225, height = 225)
penguin_frequency = data(penguins) * frequency() * mapping(:species)
draw(penguin_frequency; axis)

Julia, no sysimage:

julia> @time @eval include("plot.jl")
 48.083621 seconds (112.62 M allocations: 6.077 GiB, 2.91% gc time, 91.97% compilation time: 11% of which was recompilation)

Julia, asysimg:

asysimg> @time @eval include("plot.jl")
 28.630338 seconds (85.72 M allocations: 4.579 GiB, 9.30% gc time, 98.35% compilation time

Note that the first time (ever) the script is run, Julia needs to download the dataset.

3 Likes

Here are my attempts to work with this. On my computer I lowered TTFX from 90sec to 4sec.

version info
julia> versioninfo()
Julia Version 1.9.0-DEV.1566
Commit ea991745a99 (2022-10-10 13:10 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × AMD Ryzen 7 1700 Eight-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver1)
  Threads: 1 on 16 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 1

After precompiling and downloading datasets, I have the following:

julia> @time @eval include("plot.jl")
 85.086753 seconds (79.78 M allocations: 5.128 GiB, 3.66% gc time, 87.44% compilation time: 2% of which was recompilation)

gosh… This is ridiculously slow!

I used the sysimage builder from the VS code julia plugin. This was the JuliaSysimage.toml file I used:

[sysimage]
exclude=[]   # Additional packages to be exlucded in the system image
statements_files=[]  # Precompile statements files to be used, relative to the project folder
execution_files=["plot.jl"] # Precompile execution files to be used, relative to the project folder

I used your MWE as an execution file. Building the sysimage took 9 minutes.

With the sysimage I got:

julia> @time @eval include("plot.jl")
  3.846860 seconds (2.05 M allocations: 144.002 MiB, 1.33% gc time, 77.32% compilation time: 99% of which was recompilation)

The conclusion here is that I had to use an example workflow script in the generation of the sysimage, presumably because either CairoMakie or AlgebraOfGraphics is not using SnoopPrecompile. SnoopPrecompile is a tool that makes it easier for libraries to include example workflows which makes precompilation (and sysimages) more effective. I would suggest filing feature request issue to AlgebraOfGraphics. The ecosystem needs to use SnoopPrecompile more.

8 Likes

But then this means I have to precompile for every script, on every update. Would this not be even worse than just taking the first-call compilation time?

No you don’t. It just turned out that not providing any workflow-like script didn’t precompile enough methods. Try the procedure that @Krastanov outlined to generate a system image and then use the system image and run a different script. It should also be much(!) faster than without the system image.

2 Likes

On every update, yes. That is why I haven’t done it. Makie is still in very vigorous developement and updates come often. I don’t want to do a sysimage every time.

This is way, way too much work. Just look at how many tools are required to perform something this basic. How would anyone just “know” about SnoopCompiler, all the options to build sysimages, etc?

While I agree in spirit - I also want things to be much faster - please consider that 1) you are using Makie which is one of the worst (if not the worst) package when it comes to TTFX and 2) you don’t have to know about SnoopCompiler, “just” system images.

2 Likes

No, not every script. That was the point of my “file a feature request issue with the devs” comment. You need one single “typical” script, and you should expect the library devs to actually do that internally so that you do not need to do it.

2 Likes

I came to use Makie because AoG, and that it is quite powerful and fast (once the compilation is done). The other grammar of graphics library (Gadfly?) doesn’t look very active

1 Like

FYI, I filed an issue over at AOG: SnoopPrecompile · Issue #430 · MakieOrg/AlgebraOfGraphics.jl · GitHub

3 Likes

There are three grammar of graphics libraries that I have been happy to use and if any of them seem inactive, I would say it is because they are “good enough” or “complete enough”.

I personally prefer AlgebraOfGraphics, not the least because it is part of the Makie ecosystem. But Gadfly (independent) and StatPlots (part of the Plots ecosystem) are both pleasant to use. Plots is also notoriously problematic with TTFX.

However, for all of these cases, the devs of the libraries would actually be grateful if you file TTFX bug reports, as long as there is something actionable in them. Our example above is pretty good. You, the user, at this stage of julia development, might be expected to know how to make a sysimage, but you certainly should not be expected to have to know what precompilation commands are internally executed. In our test above, we found that adding a few precompilation tasks (the typical user script) sped up the import of many other scripts (not just the script we used). This is a sign that the devs can easily make their package precompile better. All of this became only recently possible, so I would cut them some slack on not having done it yet :smiley:

Also, do bear in mind that these are tools that became possible only very recently, after enormous amount of work by just a few volunteers. You are right to complain that things are not automated enough, but this automation is happening as we speak.

And a final reiteration: you need only one “typical” usecase script to speed up all scripts (and the devs of various libraries can do that behind the scenes with SnoopPrecompile so that you do not need to do it yourself)

2 Likes

On every update, yes. That is why I haven’t done it. Makie is still in very vigorous developement and updates come often. I don’t want to do a sysimage every time.

Unless you are using master from Makie, updates usually come once or twice a month, depending on how many changes accumulate and how busy the maintainers are.
Given that generating a sysimage takes ~10 min, I would say doing a sysimage every two weeks (on every update) is an acceptable cost when compared to how much TTFP it safes you when working with Makie every day for two weeks.

Of course, the situtation is different if you update a project environment twice a day because some update of another package …

Many agree with you, me included. As @Krastanov said, it is only going to get better from now :wink:

Let me just drop minimal instructions to compile a sysimage with Makie alone (assuming you have PackageCompiler.jl already installed):

  1. Run julia --project precompile.jl (here I precompile both GLMakie, CairoMakie) where
# precompile.jl
using PackageCompiler

PackageCompiler.create_sysimage(
		[:CairoMakie,:GLMakie], sysimage_path=joinpath(@__DIR__, "MakieSys.so")
)
  1. Enjoy with julia --project -J MakieSys.so

The precompile.jl script is minimal because the Makie devs added a good selection of precompile statements to get a typical workflow covered. I don’t know about AlgebraOfGraphics, but maybe the issue @carstenbauer opened is going to cause a PR that improves on their end too.

1 Like

Just to clarify a bit, because we have suggested much too many different options. There are three common options to make a sysimage (yes, that is too many options, this should be automated, people are working on this).

  • manually using PackageCompiler
  • using the VS code plugin (by far my favorite, I consider it simplest)
  • using AutoSysimages

Package developers need to use something like SnoopPrecompile to make these sysimages as performant as possible. If they do not, send them an issue request. While waiting for them to do it, you can use “typical use script” to train your sysimage (easiest with the VS code plugin). That makes all your scripts in the given project load faster, not just the “typical use script”.

And as always, be sure you have a good Project.toml file.

2 Likes

It’d be nice to tie sysimage generation and loading into Pkg operations. In most cases, there’ll be one sysimage per environment (loaded whenever that environment is activated), and most sysimages should be automatically regenerated upon package update. Ideally, there would be semantics to soft-pin sysimage’d packages to prevent them from changing versions during Pkg operations unless some additional confirmation is received from the user.

1 Like

It would be nice if, somehow, that could be handled by juliaup. Imagine that one creates the script to generate a sysimage, and juliaup tracks the julia version and package dependencies of that image, and warns the user that a “new version” of the sysimage can be built, as it does with regular Julia updates.

2 Likes