Sure, so I recommend for a beginner to eliminate the global environment, then get used to local environments, then decide whether you want to add stuff to global environment after you understand how it all works.
But if a beginner has willy nilly put hundreds of packages into global environment, this is a good opportunity to start over I think.
mileage may vary, given that we’ve explained the thinking here, I think it’s up to the OP to decide which way to go.
It moves me to another environment. Most typically I’m talking about, for instance, having a function that computes something, and this is the goal of the current package/project. Then I want to do a quick plot to see if the result is the expected one. I don’t want to have to add a plotting package to the current environment, neither to a temporary one. Thus, I do keep even Plots in my global shared environment, in such a way that I can plot stuff from wherever I am, and the same for benchmarking tools, etc.
The real issue is really having the main environment bloated AND developing stuff on it. In this case any new package added can cause a cascade of dependency updates and trigger a lot of precompilations, that can be quite annoying. But if the bloated main environment is just used as an eventual dirty-package repository, and the current project is kept clean, that does not occur frequently.
Projects is a different thing. Of course I can use it, but that doesn’t change the fact that Makie takes half a minute to import, without extraneous dependencies. I don’t know Julia internals well enough, but I don’t get why precompiled packages are still so slow to load. Aren’t they supposed to be ready to execute immediately?
I think you are still mixing up two different “slowdown” issues. Importing CairoMakie on my system takes 13 seconds, which is indeed too long but (a) there is a workaround, (b) it will be less in the next few releases of Julia and (c) it is much faster than what you are reporting so there is some other issue on your end that we should try to address first.
Could you try the following:
Make a new Project (venv in Python-speak) and ] add CairoMakie in it. That would cause download and compilation that can take more than a minute. On my computer this takes 3min30sec. Is that similar to your experience? This however should happen very rarely: when you start a project and when you update libraries installed in the project.
Open that project again and @time @eval using CairoMakie. That was 14sec for my computer. It is much faster than the install step because a lot of code was indeed precompiled and cached. However, because the julia compiler can not know all the possible combinations of types on which code will be called, it can not precompile and cache everything. In previous post here I linked to issues to track on github about how the compiler is being made smarter in order for this caching to be better.
In the meantime, there is a solution: with a sysimage using CairoMakie takes milliseconds and plotting the first plot takes 2 seconds (it is very fast after that).
I believe you were conflating the initial installation+compilation step that should be happening very rarely with a well organized Project file and the using ... step that happens often but does not involve long package precompilation unless the Project environment has been modified. That latter step is indeed also frustratingly slow but (a) nowhere near as slow as what you are experiencing and (b) being fixed both upstream in future julia releases and already today with custom sysimages.
Could you let me know which of the above steps do not match your experience? We can try to figure out where to go from there.
Ok, these results you have I believe would match the experience of most people here (maybe some things would be up to two times faster because of beefier desktop computers, but that is not excuse). (Edit: the “2-3 minute” quote in your end paragraph is still weird, but I am focusing on the measures that are explicitly reported in this post. If you can reproduce the “2-3 minute” delays with a new Project file, please share so we can investigate)
With stock julia 1.8 and a package as ridiculously dynamic as Makie you can not do better than the 10-to-20sec you just measured. However, since julia 1.8 it is possible to cache much of this in a sysimage. This is the only way to get millisecond-fast Makie import today. It was actually made possible by improvements to the caching of code in 1.8 and it was not possible in 1.7. Tim Holy is among the heros that made it possible.
In juia 1.9 and more plausibly 1.10, sysimages might not be necessary, thanks to smarter compiled code caching schemes being developed.
There are a couple of good ways to make a sysimage in 1.8. My preferred way is to just make a default (no customizations) sysimage command in VS code. It usually makes good enough sysimages. Other folks prefer AutoSysimage.jl. I completely agree that it is frustrating to have to add these extra steps, but on the other hand, compilation steps like that are normal for compiled languages like Rust and C. Julia is trying to hit a very difficult middle ground between dynamic and compiled-fast, so I am for the moment content with this extra step. As I mentioned, in the next few versions of julia it might not be necessary.
A tangent: I believe it is worthwhile to discuss why this is such a phenomenally big problem in julia. Julia has two very special features other languages do not share: (1) multimethods as the fundamental principle for the entirety of the ecosystem and (2) compiled code. It is very difficult to know what code you need compiled and to not discard the vast majority of already compiled code when importing new libraries that add new methods for pre-existing functions. No one has had to deal with this problem before julia. It is being slowly dealt with. Sysimages basically carry the promise that no significant amount of new methods will be defined, hence they can cache more compiled code (this is very oversimplified borderline misleading explanation).
Edit: I have an experimental simulator for some quantum physics project. It uses DiffEq, QuantumOptics, QuantumClifford, SimJulia, GLMakie, and a bunch of other libraries. Similarly to you I had more than a minute for first-time import even with a stable Project.toml. Before 1.8 I had to be very careful with using Revise and long running julia processes and let blocks to keep my workspace clean of temporary variables. Now I use a sysimage in vscode and import times are less than a second. I had to modify my workflow but I am incredibly productive with this new workflow.
The import part is actually not as long as I thought:
@time @eval begin
26.104101 seconds (62.05 M allocations: 3.762 GiB, 5.58% gc time, 17.67% compilation time: 88% of which was recompilation)
@time @eval Tne = CSV.read("data/data.csv", DataFrame)
8.775861 seconds (20.21 M allocations: 1.073 GiB, 2.38% gc time, 99.98% compilation time)
I didn’t touch the rest. Now you see not only import takes a few dozen seconds, but first call to every function to those imports take a few seconds first. In some of my more complicated analytic script, when I ensure they reproduce any result (and thus must run fresh), it takes ~10 minutes to produce all the plots. For data no bigger than 16MB!
Dead variables are just a workflow problem. During research, one cannot possibly know what kind of code structure they end up with, and there is a lot of ad-hoc visualization via VSCode cells, etc because everything is still unknown.
At some point, loose variable float around and become hazardous due to stale data, etc. “Cleaning up” would ideally remove all the globals without touching the imported packages so I don’t have to pay minutes for the new julia session to compile all those functions.
The issue has nothing to do with dead variables taking memory.
Ok, now I see what you mean.
I guess you already know, but that issue is not exclusive to Julia and also happens e.g. with Jupyter notebooks in python. I have to admit this is the main reason I dislike notebooks too
Just my two cents on how I circumvent the above (I don’t use VSCode, just a vanilla REPL):
Startup (which includes loading Revise.jl)
Explore some data
After a while I see a bunch of data I often either recompute or load again. I extract that bit into a script and reinclude it (or add a caching mechanism to reload it). If needed, I go back to 1.
Go to 2. and repeat.
I am sure people already told you about that kind of workflow. But since it hasn’t been mentioned yet, I thought I bring it up.
I remember also trying to do everything with scripts and fresh Julia session for some time, because I knew it also works this way with python. Until I realized that not everything needs to behave like python and I gave the above workflow a try.
Could you share a self contained example, maybe the way you did with the PalmerPenguins? For instance, a 20 line gist on github that imports the libraries and the fake data and makes a couple of simple plots? I am very surprised by the slowness of the sysimage you have. I imagine a lot of folks here would take it as a point of pride to figure out why the load is so slow (I am not promising that I will have the time to debug this in the next couple of days).
Something to help understand is that precompilation does not mean “ahead-of-time compilation”, but “step before compilation” in this context. Like simon explained in this old post.
So the compilation is still done on first execution. But I agree it is still frustrating. I think I got used to the ~1 minute before being able to do something useful. I go and prepare some coffee before. I am glad that storing compiled code is in the horizon!