Yes it actually does! But as a much smaller fraction than the 40-50% Parsers of the time actually takes.
Honestly I also got frustrated with SnoopCompile.jl because nearly all of the precompilation suggestions had no effect, and just using Cthulhu.jl to improve type stability in the general area of the problem seemed to work better for local package issues.
It took me a while to understand that precompilation is mostly effective when you precompile a few packages deep, at the point where you hit solid type stability; and that these really low level packages are often where the problems are. I initially assumed it wasnât an actual problem with Parsers.jl or JSON.jl, but some outcome of type instability higher up.
Isnât this just the packages you precompiled in, and their dependencies?
So for data analysis, say you precompiled CSV, DataFrames, DataFramesMeta, StatsPlots
So that you can do common R like data analysis tasks, is this really going to cause a problem? Also assume you regenerate this sysimage every month or so as new updates to the packages come out.
Yes⌠but in practice that doesnât help much. Taking your example, I just created a project with CSV, DataFrames, DataFramesMeta, StatsPlots. The resulting Manifest describes 212 packages, but some of these are standard libraries so they can be ignored.
Now letâs say I want to use KernelDensity. Is it safe to install in this sysimage?
The first problem is that thereâs no easy way to check if a packageâs dependency tree overlaps with the sysimage. But letâs do the work. First we find the non-stdlib packages in the âsysimageâ:
So KernelDensity has 41 non-standardlib dependencies and they are all in the custom sysimage? Yes, KernelDensity is actually already in the sysimage. But to use it (with using KernelDensity) I still need to add it, and thereâs a good chance Pkg will âupgradeâ it (or some dependencies) and put Julia in an inconsistent state.
What about a package thatâs not already in the sysimage then? Letâs add TensorCast:
Indeed, I am sure itâs possible to get into an inconsistent state by adding additional packages. However, so far, this has had no noticeable effect on my analyses. Itâs not entirely clear to me what happens if you add for example KernelDensity to your project. Yes Pkg will try to upgrade it⌠but when you using KernelDensity my guess is that itâs the one in the Sysimage that actually gets loaded. This makes your Manifest incorrect, so can potentially cause problems for reproducibility, but doesnât actually cause say crashing or brokenness.
My take on it is this:
make the sysimage with as many packages youâre likely to use as possible.
do all your throwaway analyses by default with it, for the most part it seems fine.
Anything that isnât a throwaway analysis is also relatively safe unless you have to add packages which are actually incompatible with the sysimage versions.
Anything you want to be reproducible you should, before publishing it, run it with regular Julia and no sysimage after ] instantiate to get exactly whatâs in the Manifest.
Regularly update your sysimage to the latest stuff
One of the things thatâs not been mentioned is to check whether youâre over-specializing. Gains like those in Understanding and optimizing compiler time (just a bit) - #13 by Tamas_Papp are only possible in rare circumstances, but milder versions are fairly common. The fastest way I know of to gain insights about this is Profile-guided despecialization ¡ SnoopCompile. A lower-tech approach is to use using MethodAnalysis; methodinstances(MyPkg) and just browse the list. You have to run workloads first for this to be useful.
The reason I emphasize this point is that reducing needless specialization is likely to be useful long-term, whereas understandable frustrations with things like forced-precompilation not making much difference may be more transitory as Julia itself improves.
Thanks, thatâs super useful! There seems to be something wrong with the default xaxis/yaxis detection though, I was only able to get correct results by setting them manually.
Use @time @eval thefunction(x) as is advised in the docstring for @time. Without it, some compilation occurs before @time is called causing confusing results.
My take on why these tiny changes (simply removing 4 @inline macros) had so much effect is that large constant vectors where being inlined wherever the function was used, duplicated everywhere by the compiler. Removing @inline from 4 functions improved Parsers.jl compile time by 80%. And probably the TTFX of a few hundred packages like CSV.jl/JSON.j/Blink.jl etc etc.
You can find these in Cthulhu.jl output. If you ever have to scroll through data rather than code in @descend, you may have a problem like this.
The takeaway is to be careful with @inline and large const variables. Precompiling them can break Base on windows, and takes a long time and a lot of memory.
Hard to say really, besides âit dependsâ. I think the problem with Parsers.jl is there are multiple constant vectors with length around 300 that are inlined over and over again inside functions that are already 200-400 lines. They also use conditionals on unstable Union fields in structs, so nothing can be elided. So the total length of the lowered code (or somewhere, my understanding of this is shallow) balloons enormously.
If you only inline one long constant once, it probably makes no noticeable difference to anything.
Not following the details of the thread super closely here, but as a new Julia user, I just want to say the TTFP problem is a real downer. I was super excited about Julia, but Iâve been spoiled by Python, I guess. Itâs so frustrating to wait so long and fish through different packages and strategies to reduce that wait time. And pre-compiling on installation? That takes a whileâŚis there no way to install precompiled packages? It makes me want to stick with Python for all but my high-performance needs.
As an R user, I was initially thrown by the TTFP compared to Râs graphics packages. The situation is slowly improving in Julia. The lag is also time to first plot. Subsequent plots are quick, especially when plotting thousands of points to a pdf: something where R would spin for all plots not just the first plot.
There are other threads here on making a sysimage of your project to speed up subsequent TTFPs. Also, one can call R or python libraries from Julia and use their plotting packages. Last, some plotting frameworks in Julia load much faster than others â try out different ones.
The many advantages of using Julia for data analysis projects make the investment to learn the above worth the effort for me.