The issue is not specific to Pluto in any way, it’s just a single instance. Everyone who uses environments for many projects/analyses/scripts is affected, because even very similar environments created on different dates differ a lot, and require recompilation — see my quantification of this above.
It’s not “cache”, it’s just a regular environment. Different projects/analyses need different envs anyway so that they could be reproduced later. Otherwise, with shared env, updating/adding a package for AnalysisA can silently break AnalysisB.
Great, thanks, that is indeed a big part of it. So more or less what I’m proposing is to default to an offline add, it that fails, do an online add, effectively this:
(jl_89Sowl) [offline] pkg> add Example
Resolving package versions...
ERROR: Unsatisfiable requirements detected for package Example [7876af07]:
Example [7876af07] log:
├─Example [7876af07] has no known versions!
└─restricted to versions * by an explicit requirement — no versions left
(jl_89Sowl) pkg> add Example
Resolving package versions...
Installed Example ─ v0.5.3
[7876af07] + Example v0.5.3
[7876af07] + Example v0.5.3
1 dependency successfully precompiled in 1 seconds. 33 already precompiled.
The Project.toml and Manifest.toml stuff works exactly as many like. I tell students to instantiate the manifest once to reproduce an enviornment - with the expectation set with them that the instantiation process might be slow as with all other package system like conda/etc. or even a little longer because it needs to do more high-performance compilation - but it is a one-time thing.
But that is a fixed cost, and afterwards it is blindingly fast these days. Students are told not to add in packages to the shared notebook environment haphazardly, but they rarely would anyways. No package operations ever occur unless manually triggered.
Are you sure this isn’t Pluto workflow specific? I haven’t encountered it in either jupyter, VSCode or antyhing else for a long time. If anything reinstalls or rebuilds it is because of a decision I made.
A key feature of Project/Manifest setup is to avoid that sort of thing. The speed of package evolution should be irrelevant since you want a reproducible snapshot. If you aren’t using shared manifests for each project (or set of lecture notes) then I can understand, but that is missing out on an amazing feature. Or maybe the feature already exists in Pluto but for whatever reason people aren’t using it in that way?
Sure, if you have a single environment that you never modify — no recompilation happens. That doesn’t depend on repl/vscode/pluto at all.
It’s exactly the same in any context, including Pluto.
Nothing recompiles if you don’t do any package operations.
But consider the following simplified scenario (unrelated to teaching/learning). Every week or so, you want to perform some kind of a new analysis that needs some packages and writing some code.
Naturally, you want to:
be able to reproduce these analyses in the future after a few years,
and modify one of them independently in the future without breaking others.
So, following totally sensible Julia recommendations, you create new environments for these projects. Package sets are often similar with some differences here and there. But still, all of these envs require a long precompilation:
when you start a new analysis, because latest package versions changed since last week,
when you update julia/run them on a difference machine/compiled cache gets cleaned-up (it cannot store tens of Gb for all different versions forever).
This makes 1.9 slower in the (arguably common) scenario when lots of small projects/analyses only see a couple of executions. Like, I play with some dataset, make some plots — turns out results aren’t useful for now. In a year, it becomes relevant, so I load the same env + code, everything reproduces exactly as before, I change something and produce a few plots. The end. I paid two recompilations for two code executions (more exactly, two executions in different julia sessions – those without julia restarts don’t count anyway).
The example sounds arguably not so common because (1) most people focus on one or a few projects at the time, (2) some people will have large projects with long running times where compilation time doesn’t matter, and (3) if you can wait a week before starting the notebook again then compilation doesn’t matter so much.
For power users and package developers, Julia 1.9 makes work a lot faster again.
Yes. And I would argue even moreso for entry users. When people are learning the language for a particular application they don’t add in packages all that often. And if students are learning from a set of lecture notes then those lecture notes have been tested against a particular manifest so compilation is a one-time thing and I don’t even want them messing with versions.
I think you should consider whether your workflow needs to be modified to prevent doing package operations on every restart. For any project with non-ephemeral code you should always work with a manifest, and if soI am not sure why package operations would happen very often triggering reinstallation.
For tinkering around the best workflow I am aware for julia is creating either a new “project” to add packages/etc. or a persistent “sandbox” project you can mess around with. This is the same with python… I tell people to alway use conda virtual environments and expect things to be slow when they add packages.
Compiler development resources are the most scare resource in the community, so time is spent making julia compile faster could be spent on things like better support for AD/etc., fixing performance regressions, further lowering the TTFX, etc.
Having some sort of compile-in-the-background solution, as at least one person suggested above, would be really helpful here.
Ideally, the TTFX should be at most the time it takes to compile everything needed to do X immediately after Julia startup; and after all background compilation is done, TTFX is just the execution time.
I would say 1.9 is faster for developing top-level packages because the dependencies stay intact between restarts. It’s slower for testing lower-level packages by running higher-level ones that depend on them. Part of the issue is that many packages added comprehensive workloads for precompilation, which run again and again without you as a developer profiting. There are some options to disable at least SnoopPrecompile.jl blocks via local preferences, but it’s not the most convenient solution because there might be many such packages and you have to disable them all manually.
@tim.holy are there any native code cashing between different package versions to eliminate compilation of duplicated code blocks even if there are some changes between versions? I doubt that some low-level stuff changes very often even if package version is increased.
You can just disable pkgimages with a command line flag if that is your workflow and you get back to 1.8 style precompilation. You can even alias your default Julia to do that and you haven’t lost anything.
Totally, thats probably a good idea. But my comment wasn’t complaining about the workflow, I was just pointing out that the idea that 1.9 is faster for development is not broadly true.
@jlperla that’s an ideal separation of workflows. But if you maintain 10s of the packages in the toolchain you are using for some project, you are likely to be using recent versions because you bumped them for things you need yourself, and are also likely to hit bugs and required fixes in these new versions regularly.
You can often solve any problems by fixing the package directly - much faster than doing some workaround in your scripts. Then you get your work done, and PRs pushed as a bonus for everyone else. But 50 packages recompile on restart.
I’m not gonna argue about how relatively frequent different scenarios are, as we are unlikely to have any real data on it. But IME this is a pretty common scenario in sciences.
For long-running tasks, all this TTFX discussion and its improvements are irrelevant anyway.
As others already said, that’s also far from the case for many usage scenarios. I consider myself a power user, and find it likely that 1.9 will make many of my workflows slower. Luckily, there seems to be a switch that enables older, faster precompilation.
I believe it’s important to focus on true TTFX as well, considering recompilation that is likely to happen regularly in common recommended workflows.
TTFX is exactly what’s “compile faster” is about. And (arguably?) much more users benefit from TTFX improvements than from better AD.
That’s totally true! I seem to notice that some common packages take significantly longer to precompile now than a few months ago, but don’t have hard data on that.
To me that is package installation and resolution, not TTFX. Compile everything, install packages, and take whatever time you need to make things as fast as possible afterwards - and then amortize that fixed cost by using that snapshot of packages and changing them infrequently. Personally, I want even slower compilation times because I know it comes with more caching and testing/removing invalidations! Generate as much as possible with the compilation process during the installation on the off-chance that I will use it down the road it is super-fast.
So as an outsider (who tried Pluto and got confused for this exact reason) it sounds like Pluto (and related workflows) need to find better ways to amortize that fixed cost. To me, that is what Project/Manifest files already do (not to mention making things more reproducible). I take @Raf
point that with people maintaining 50 repos maybe there isn’t a better option though, so it is nice to turn off the caching for them. But that is a small segment of the community.
Consider a different language for a second. Lets say a workflow required me to install pytorch (which has a bunch of binaries and takes forever) and supporting libraries for my python notebooks every time I opened a notebook. And if I had a dozen notebooks then it potentially did that every time, or whenever any of the dependencies had minor updated? I wouldn’t blame conda or pip for that, I would rethink how I am using them. Even containerized setups like binderhub build an image from the pip requirements or Project.toml file taking as much time as they need and make things blindingly fast afterwards.