Yes… but in practice that doesn’t help much. Taking your example, I just created a project with CSV, DataFrames, DataFramesMeta, StatsPlots. The resulting Manifest describes 212 packages, but some of these are standard libraries so they can be ignored.
Now let’s say I want to use KernelDensity. Is it safe to install in this sysimage?
The first problem is that there’s no easy way to check if a package’s dependency tree overlaps with the sysimage. But let’s do the work. First we find the non-stdlib packages in the “sysimage”:
using Pkg Pkg.activate(temp=true) Pkg.add(["CSV", "DataFrames", "DataFramesMeta", "StatsPlots"]) sysdeps = Pkg.dependencies(); @show length(sysdeps) is_stdlib(x) = occursin("/julia/stdlib/", x.source) sysdeps_nostd = filter(!is_stdlib, sysdeps) @show length(sysdeps_nostd) # Output length(sysdeps) = 212 length(sysdeps_nostd) = 167
A package can be safely installed if its dependencies are disjoint from those 167 packages. Here are the package names:
AbstractFFTs Adapt Arpack Arpack_jll AxisAlgorithms Bzip2_jll CSV Cairo_jll Chain ChainRulesCore ChangesOfVariables Clustering CodecZlib ColorSchemes ColorTypes Colors Compat Contour Crayons DataAPI DataFrames DataFramesMeta DataStructures DataValueInterfaces DataValues DensityInterface Distances Distributions DocStringExtensions EarCut_jll Expat_jll FFMPEG FFMPEG_jll FFTW FFTW_jll FilePathsBase FillArrays FixedPointNumbers Fontconfig_jll Formatting FreeType2_jll FriBidi_jll GLFW_jll GR GR_jll GeometryBasics Gettext_jll Glib_jll Graphite2_jll Grisu HTTP HarfBuzz_jll IniFile InlineStrings IntelOpenMP_jll Interpolations InverseFunctions InvertedIndices IrrationalConstants IterTools IteratorInterfaceExtensions JLLWrappers JSON JpegTurbo_jll KernelDensity LAME_jll LZO_jll LaTeXStrings Latexify Libffi_jll Libgcrypt_jll Libglvnd_jll Libgpg_error_jll Libiconv_jll Libmount_jll Libtiff_jll Libuuid_jll LogExpFunctions MKL_jll MacroTools MbedTLS Measures Missings MultivariateStats NaNMath NearestNeighbors Observables OffsetArrays Ogg_jll OpenSSL_jll OpenSpecFun_jll Opus_jll OrderedCollections PCRE_jll PDMats Parsers Pixman_jll PlotThemes PlotUtils Plots PooledArrays Preferences PrettyTables Qt5Base_jll QuadGK Ratios RecipesBase RecipesPipeline Reexport RelocatableFolders Requires Rmath Rmath_jll Scratch SentinelArrays Showoff SortingAlgorithms SpecialFunctions StaticArrays StatsAPI StatsBase StatsFuns StatsPlots StructArrays TableOperations TableTraits Tables TranscodingStreams URIs UnicodeFun Unzip Wayland_jll Wayland_protocols_jll WeakRefStrings Widgets WoodburyMatrices XML2_jll XSLT_jll Xorg_libX11_jll Xorg_libXau_jll Xorg_libXcursor_jll Xorg_libXdmcp_jll Xorg_libXext_jll Xorg_libXfixes_jll Xorg_libXi_jll Xorg_libXinerama_jll Xorg_libXrandr_jll Xorg_libXrender_jll Xorg_libpthread_stubs_jll Xorg_libxcb_jll Xorg_libxkbfile_jll Xorg_xcb_util_image_jll Xorg_xcb_util_jll Xorg_xcb_util_keysyms_jll Xorg_xcb_util_renderutil_jll Xorg_xcb_util_wm_jll Xorg_xkbcomp_jll Xorg_xkeyboard_config_jll Xorg_xtrans_jll Zstd_jll libass_jll libfdk_aac_jll libpng_jll libvorbis_jll x264_jll x265_jll xkbcommon_jll
We see the second problem: the sysimage includes many basic packages, so it’s unlikely that a new package will not share some dependency.
Let’s check the situation for KernelDensity:
Pkg.activate(temp=true) Pkg.add("KernelDensity") kddeps = Pkg.dependencies(); @show length(kddeps) kddeps_nostd = filter(!is_stdlib, kddeps); @show length(kddeps_nostd) common_deps = intersect(keys(sysdeps_nostd), keys(kddeps_nostd)); @show length(common_deps) length(kddeps) = 85 length(kddeps_nostd) = 41 length(common_deps) = 41
So KernelDensity has 41 non-standardlib dependencies and they are all in the custom sysimage? Yes, KernelDensity is actually already in the sysimage. But to use it (with
using KernelDensity) I still need to
add it, and there’s a good chance Pkg will “upgrade” it (or some dependencies) and put Julia in an inconsistent state.
What about a package that’s not already in the sysimage then? Let’s add TensorCast:
Pkg.activate(temp=true) Pkg.add("TensorCast") tcdeps = Pkg.dependencies(); tcdeps_nostd = filter(!is_stdlib, tcdeps); common_sys_tc = intersect(keys(sysdeps_nostd), keys(tcdeps_nostd)); @show length(tcdeps) @show length(tcdeps_nostd) @show length(common_sys_tc) # Output: length(tcdeps) = 75 length(tcdeps_nostd) = 34 length(common_sys_tc) = 22
So it’s also unsafe to add TensorCast. Here the 22 shared dependencies are
AbstractFFTs Adapt ChainRulesCore ChangesOfVariables Compat DataAPI DataStructures DocStringExtensions InverseFunctions IrrationalConstants JLLWrappers LogExpFunctions MacroTools Missings OffsetArrays OrderedCollections Preferences Requires SortingAlgorithms StaticArrays StatsAPI StatsBase
Conclusion: unless I misunderstood something, it seems unlikely that an additional package can be safely added to a custom sysimage…
If we do make threads requesting help with TTFX issues I think they should be prefixed “TTFXFTFY:”.
Indeed, I am sure it’s possible to get into an inconsistent state by adding additional packages. However, so far, this has had no noticeable effect on my analyses. It’s not entirely clear to me what happens if you add for example KernelDensity to your project. Yes Pkg will try to upgrade it… but when you
using KernelDensity my guess is that it’s the one in the Sysimage that actually gets loaded. This makes your Manifest incorrect, so can potentially cause problems for reproducibility, but doesn’t actually cause say crashing or brokenness.
My take on it is this:
- make the sysimage with as many packages you’re likely to use as possible.
- do all your throwaway analyses by default with it, for the most part it seems fine.
- Anything that isn’t a throwaway analysis is also relatively safe unless you have to add packages which are actually incompatible with the sysimage versions.
- Anything you want to be reproducible you should, before publishing it, run it with regular Julia and no sysimage after
] instantiateto get exactly what’s in the Manifest.
- Regularly update your sysimage to the latest stuff
For some truly amazing TTFX analysis, check out the comments by @tim.holy for Makie.jl:
One of the things that’s not been mentioned is to check whether you’re over-specializing. Gains like those in Understanding and optimizing compiler time (just a bit) - #13 by Tamas_Papp are only possible in rare circumstances, but milder versions are fairly common. The fastest way I know of to gain insights about this is Profile-guided despecialization · SnoopCompile. A lower-tech approach is to use
using MethodAnalysis; methodinstances(MyPkg) and just browse the list. You have to run workloads first for this to be useful.
The reason I emphasize this point is that reducing needless specialization is likely to be useful long-term, whereas understandable frustrations with things like forced-precompilation not making much difference may be more transitory as Julia itself improves.
Thanks, that’s super useful! There seems to be something wrong with the default xaxis/yaxis detection though, I was only able to get correct results by setting them manually.
Not quite sure what you mean, can you post a reproducer? For me the demo on that page works as expected. (Might be a PyPlot issue?)
With the code snippet in the link, I get
julia> PyPlot.xlim() (0.0008912509381337459, 10.329824855824024) julia> PyPlot.ylim() (0.8912509381337456, 11.220184543019636)
Weird. For me (PyPlot.jl v2.10.0, using matplotlib 3.4.2) it comes out square thanks to this line. It seems it’s not working for you.
OK, I’m on matplotlib 3.1.1, so it’s probably a PyPlot bug, your code indeed looks fine, sorry for the noise!
You may be able to update to a later version with
@time @eval thefunction(x) as is advised in the docstring for
@time. Without it, some compilation occurs before
@time is called causing confusing results.
Another update! Precompilation of Parsers.jl ended up causing precompilation failures on Windows: CSV can't be precompiled in VSCode on Windows 10 · Issue #994 · JuliaData/CSV.jl · GitHub
So it was necessary to track down the real cause of compilation problems:
My take on why these tiny changes (simply removing 4
@inline macros) had so much effect is that large constant vectors where being inlined wherever the function was used, duplicated everywhere by the compiler. Removing
@inline from 4 functions improved Parsers.jl compile time by 80%. And probably the TTFX of a few hundred packages like CSV.jl/JSON.j/Blink.jl etc etc.
You can find these in Cthulhu.jl output. If you ever have to scroll through data rather than code in
@descend, you may have a problem like this.
The takeaway is to be careful with
@inline and large
const variables. Precompiling them can break Base on windows, and takes a long time and a lot of memory.
How small is safe and nonimpactful?
Hard to say really, besides “it depends”. I think the problem with Parsers.jl is there are multiple constant vectors with length around 300 that are inlined over and over again inside functions that are already 200-400 lines. They also use conditionals on unstable
Union fields in structs, so nothing can be elided. So the total length of the lowered code (or somewhere, my understanding of this is shallow) balloons enormously.
If you only inline one long constant once, it probably makes no noticeable difference to anything.
Not following the details of the thread super closely here, but as a new Julia user, I just want to say the TTFP problem is a real downer. I was super excited about Julia, but I’ve been spoiled by Python, I guess. It’s so frustrating to wait so long and fish through different packages and strategies to reduce that wait time. And pre-compiling on installation? That takes a while…is there no way to install precompiled packages? It makes me want to stick with Python for all but my high-performance needs.
As an R user, I was initially thrown by the TTFP compared to R’s graphics packages. The situation is slowly improving in Julia. The lag is also time to first plot. Subsequent plots are quick, especially when plotting thousands of points to a pdf: something where R would spin for all plots not just the first plot.
There are other threads here on making a sysimage of your project to speed up subsequent TTFPs. Also, one can call R or python libraries from Julia and use their plotting packages. Last, some plotting frameworks in Julia load much faster than others – try out different ones.
The many advantages of using Julia for data analysis projects make the investment to learn the above worth the effort for me.
This is certainly something that takes new Julia users time to get used to. Fortunately, there are strategies to improve the experience. Compiling a system image is one of them. Here’s a video I made for my students that explains the issue and how to compile a system image. This is not the most concise explanation, but it might be of interest. Speed - YouTube
I mean, the thread is explicitly about solving your problem! It’s also a pain to everyone else.
So when giving this kind of feedback remember julia package devs are just people like you. Positive motivation is more effective than saying you will stick to python if we don’t fix things, without even reading the thread.
Fixing these problems is essentially work people need to do, and mostly for free. I personally put a few weekends into it so far and I know others are as well. But like you, we all have other more interesting things to do with our time than fix TTFX.
For you to help getting what you want, if you have any specific packages with slow load time, check if its github has an issue for this already and add your concersn, and if not make one with a minimum working example of how long it takes. It’s invaluable feedback, and shows that a) people want to use our tools, and b) care enough to time them and make an issue. So maybe its worth us putting in a few hours to fix.