Taking TTFX seriously: Can we make common packages faster to load and use

Yes it actually does! But as a much smaller fraction than the 40-50% Parsers of the time actually takes.

Honestly I also got frustrated with SnoopCompile.jl because nearly all of the precompilation suggestions had no effect, and just using Cthulhu.jl to improve type stability in the general area of the problem seemed to work better for local package issues.

It took me a while to understand that precompilation is mostly effective when you precompile a few packages deep, at the point where you hit solid type stability; and that these really low level packages are often where the problems are. I initially assumed it wasn’t an actual problem with Parsers.jl or JSON.jl, but some outcome of type instability higher up.

4 Likes

Isn’t this just the packages you precompiled in, and their dependencies?

So for data analysis, say you precompiled CSV, DataFrames, DataFramesMeta, StatsPlots

So that you can do common R like data analysis tasks, is this really going to cause a problem? Also assume you regenerate this sysimage every month or so as new updates to the packages come out.

1 Like

image

25 Likes

Yes… but in practice that doesn’t help much. Taking your example, I just created a project with CSV, DataFrames, DataFramesMeta, StatsPlots. The resulting Manifest describes 212 packages, but some of these are standard libraries so they can be ignored.

Now let’s say I want to use KernelDensity. Is it safe to install in this sysimage?

The first problem is that there’s no easy way to check if a package’s dependency tree overlaps with the sysimage. But let’s do the work. First we find the non-stdlib packages in the “sysimage”:

using Pkg
Pkg.activate(temp=true)
Pkg.add(["CSV", "DataFrames", "DataFramesMeta", "StatsPlots"])
sysdeps = Pkg.dependencies();
@show length(sysdeps)

is_stdlib(x) = occursin("/julia/stdlib/", x[2].source)
sysdeps_nostd = filter(!is_stdlib, sysdeps)
@show length(sysdeps_nostd)

# Output
length(sysdeps) = 212
length(sysdeps_nostd) = 167

A package can be safely installed if its dependencies are disjoint from those 167 packages. Here are the package names:

AbstractFFTs
Adapt
Arpack
Arpack_jll
AxisAlgorithms
Bzip2_jll
CSV
Cairo_jll
Chain
ChainRulesCore
ChangesOfVariables
Clustering
CodecZlib
ColorSchemes
ColorTypes
Colors
Compat
Contour
Crayons
DataAPI
DataFrames
DataFramesMeta
DataStructures
DataValueInterfaces
DataValues
DensityInterface
Distances
Distributions
DocStringExtensions
EarCut_jll
Expat_jll
FFMPEG
FFMPEG_jll
FFTW
FFTW_jll
FilePathsBase
FillArrays
FixedPointNumbers
Fontconfig_jll
Formatting
FreeType2_jll
FriBidi_jll
GLFW_jll
GR
GR_jll
GeometryBasics
Gettext_jll
Glib_jll
Graphite2_jll
Grisu
HTTP
HarfBuzz_jll
IniFile
InlineStrings
IntelOpenMP_jll
Interpolations
InverseFunctions
InvertedIndices
IrrationalConstants
IterTools
IteratorInterfaceExtensions
JLLWrappers
JSON
JpegTurbo_jll
KernelDensity
LAME_jll
LZO_jll
LaTeXStrings
Latexify
Libffi_jll
Libgcrypt_jll
Libglvnd_jll
Libgpg_error_jll
Libiconv_jll
Libmount_jll
Libtiff_jll
Libuuid_jll
LogExpFunctions
MKL_jll
MacroTools
MbedTLS
Measures
Missings
MultivariateStats
NaNMath
NearestNeighbors
Observables
OffsetArrays
Ogg_jll
OpenSSL_jll
OpenSpecFun_jll
Opus_jll
OrderedCollections
PCRE_jll
PDMats
Parsers
Pixman_jll
PlotThemes
PlotUtils
Plots
PooledArrays
Preferences
PrettyTables
Qt5Base_jll
QuadGK
Ratios
RecipesBase
RecipesPipeline
Reexport
RelocatableFolders
Requires
Rmath
Rmath_jll
Scratch
SentinelArrays
Showoff
SortingAlgorithms
SpecialFunctions
StaticArrays
StatsAPI
StatsBase
StatsFuns
StatsPlots
StructArrays
TableOperations
TableTraits
Tables
TranscodingStreams
URIs
UnicodeFun
Unzip
Wayland_jll
Wayland_protocols_jll
WeakRefStrings
Widgets
WoodburyMatrices
XML2_jll
XSLT_jll
Xorg_libX11_jll
Xorg_libXau_jll
Xorg_libXcursor_jll
Xorg_libXdmcp_jll
Xorg_libXext_jll
Xorg_libXfixes_jll
Xorg_libXi_jll
Xorg_libXinerama_jll
Xorg_libXrandr_jll
Xorg_libXrender_jll
Xorg_libpthread_stubs_jll
Xorg_libxcb_jll
Xorg_libxkbfile_jll
Xorg_xcb_util_image_jll
Xorg_xcb_util_jll
Xorg_xcb_util_keysyms_jll
Xorg_xcb_util_renderutil_jll
Xorg_xcb_util_wm_jll
Xorg_xkbcomp_jll
Xorg_xkeyboard_config_jll
Xorg_xtrans_jll
Zstd_jll
libass_jll
libfdk_aac_jll
libpng_jll
libvorbis_jll
x264_jll
x265_jll
xkbcommon_jll

We see the second problem: the sysimage includes many basic packages, so it’s unlikely that a new package will not share some dependency.

Let’s check the situation for KernelDensity:

Pkg.activate(temp=true)
Pkg.add("KernelDensity")
kddeps = Pkg.dependencies();
@show length(kddeps)

kddeps_nostd = filter(!is_stdlib, kddeps);
@show length(kddeps_nostd)

common_deps = intersect(keys(sysdeps_nostd), keys(kddeps_nostd));
@show length(common_deps)

length(kddeps) = 85
length(kddeps_nostd) = 41
length(common_deps) = 41

So KernelDensity has 41 non-standardlib dependencies and they are all in the custom sysimage? Yes, KernelDensity is actually already in the sysimage. But to use it (with using KernelDensity) I still need to add it, and there’s a good chance Pkg will “upgrade” it (or some dependencies) and put Julia in an inconsistent state.

What about a package that’s not already in the sysimage then? Let’s add TensorCast:

Pkg.activate(temp=true)
Pkg.add("TensorCast")
tcdeps = Pkg.dependencies();
tcdeps_nostd = filter(!is_stdlib, tcdeps);
common_sys_tc = intersect(keys(sysdeps_nostd), keys(tcdeps_nostd));
@show length(tcdeps)
@show length(tcdeps_nostd)
@show length(common_sys_tc)

# Output:
length(tcdeps) = 75
length(tcdeps_nostd) = 34
length(common_sys_tc) = 22

So it’s also unsafe to add TensorCast. Here the 22 shared dependencies are

AbstractFFTs
Adapt
ChainRulesCore
ChangesOfVariables
Compat
DataAPI
DataStructures
DocStringExtensions
InverseFunctions
IrrationalConstants
JLLWrappers
LogExpFunctions
MacroTools
Missings
OffsetArrays
OrderedCollections
Preferences
Requires
SortingAlgorithms
StaticArrays
StatsAPI
StatsBase

Conclusion: unless I misunderstood something, it seems unlikely that an additional package can be safely added to a custom sysimage…

2 Likes

If we do make threads requesting help with TTFX issues I think they should be prefixed “TTFXFTFY:”. :slight_smile:

1 Like

Indeed, I am sure it’s possible to get into an inconsistent state by adding additional packages. However, so far, this has had no noticeable effect on my analyses. It’s not entirely clear to me what happens if you add for example KernelDensity to your project. Yes Pkg will try to upgrade it… but when you using KernelDensity my guess is that it’s the one in the Sysimage that actually gets loaded. This makes your Manifest incorrect, so can potentially cause problems for reproducibility, but doesn’t actually cause say crashing or brokenness.

My take on it is this:

  1. make the sysimage with as many packages you’re likely to use as possible.
  2. do all your throwaway analyses by default with it, for the most part it seems fine.
  3. Anything that isn’t a throwaway analysis is also relatively safe unless you have to add packages which are actually incompatible with the sysimage versions.
  4. Anything you want to be reproducible you should, before publishing it, run it with regular Julia and no sysimage after ] instantiate to get exactly what’s in the Manifest.
  5. Regularly update your sysimage to the latest stuff

For some truly amazing TTFX analysis, check out the comments by @tim.holy for Makie.jl:
https://github.com/JuliaPlots/Makie.jl/discussions/1636

6 Likes

One of the things that’s not been mentioned is to check whether you’re over-specializing. Gains like those in Understanding and optimizing compiler time (just a bit) - #13 by Tamas_Papp are only possible in rare circumstances, but milder versions are fairly common. The fastest way I know of to gain insights about this is Profile-guided despecialization · SnoopCompile. A lower-tech approach is to use using MethodAnalysis; methodinstances(MyPkg) and just browse the list. You have to run workloads first for this to be useful.

The reason I emphasize this point is that reducing needless specialization is likely to be useful long-term, whereas understandable frustrations with things like forced-precompilation not making much difference may be more transitory as Julia itself improves.

10 Likes

Thanks, that’s super useful! There seems to be something wrong with the default xaxis/yaxis detection though, I was only able to get correct results by setting them manually.

Not quite sure what you mean, can you post a reproducer? For me the demo on that page works as expected. (Might be a PyPlot issue?)

With the code snippet in the link, I get

julia> PyPlot.xlim()
(0.0008912509381337459, 10.329824855824024)

julia> PyPlot.ylim()
(0.8912509381337456, 11.220184543019636)

and this

Weird. For me (PyPlot.jl v2.10.0, using matplotlib 3.4.2) it comes out square thanks to this line. It seems it’s not working for you.

OK, I’m on matplotlib 3.1.1, so it’s probably a PyPlot bug, your code indeed looks fine, sorry for the noise!

You may be able to update to a later version with

ENV["PYTHON"]=""
Pkg.build("PyCall")
3 Likes

Use @time @eval thefunction(x) as is advised in the docstring for @time. Without it, some compilation occurs before @time is called causing confusing results.

1 Like

Another update! Precompilation of Parsers.jl ended up causing precompilation failures on Windows: CSV can't be precompiled in VSCode on Windows 10 ¡ Issue #994 ¡ JuliaData/CSV.jl ¡ GitHub

So it was necessary to track down the real cause of compilation problems:
https://github.com/JuliaData/Parsers.jl/pull/114

My take on why these tiny changes (simply removing 4 @inline macros) had so much effect is that large constant vectors where being inlined wherever the function was used, duplicated everywhere by the compiler. Removing @inline from 4 functions improved Parsers.jl compile time by 80%. And probably the TTFX of a few hundred packages like CSV.jl/JSON.j/Blink.jl etc etc.

You can find these in Cthulhu.jl output. If you ever have to scroll through data rather than code in @descend, you may have a problem like this.

The takeaway is to be careful with @inline and large const variables. Precompiling them can break Base on windows, and takes a long time and a lot of memory.

32 Likes

How small is safe and nonimpactful?

Hard to say really, besides “it depends”. I think the problem with Parsers.jl is there are multiple constant vectors with length around 300 that are inlined over and over again inside functions that are already 200-400 lines. They also use conditionals on unstable Union fields in structs, so nothing can be elided. So the total length of the lowered code (or somewhere, my understanding of this is shallow) balloons enormously.

If you only inline one long constant once, it probably makes no noticeable difference to anything.

1 Like

Not following the details of the thread super closely here, but as a new Julia user, I just want to say the TTFP problem is a real downer. I was super excited about Julia, but I’ve been spoiled by Python, I guess. It’s so frustrating to wait so long and fish through different packages and strategies to reduce that wait time. And pre-compiling on installation? That takes a while…is there no way to install precompiled packages? It makes me want to stick with Python for all but my high-performance needs.

2 Likes

As an R user, I was initially thrown by the TTFP compared to R’s graphics packages. The situation is slowly improving in Julia. The lag is also time to first plot. Subsequent plots are quick, especially when plotting thousands of points to a pdf: something where R would spin for all plots not just the first plot.

There are other threads here on making a sysimage of your project to speed up subsequent TTFPs. Also, one can call R or python libraries from Julia and use their plotting packages. Last, some plotting frameworks in Julia load much faster than others – try out different ones.

The many advantages of using Julia for data analysis projects make the investment to learn the above worth the effort for me.

4 Likes