First Pluto notebook launches are slower on Julia 1.9 beta 3

Last year I did basically this, and shipped the class notebooks with system images that were specified in an Artifacts file. This worked quite well (for all the students that followed the instructions and downloaded the correct version of julia :slight_smile:): the students didnā€™t need to know anything about environments and everything was pretty fast from the start.

10 Likes

thatā€™s not an excuse to not document :slight_smile: I mean, thatā€™s even more reason to document no?

but also, Pluto is not used as a library, so the thing to document is mainly How to and Tips, look at Configuration Ā· Revise.jl

2 Likes

Oh, that is not meant as an excuse, but as an explanation, if there is not enough people contributing to such a documentation. After all, such a package is also a community effort.

3 Likes

IMHO, Pluto is known to be the opposite of that, I mean it neutrally. People like Pluto because its authors have their minds on designs and donā€™t want Pluto to be anything other than it. This is evident for example:

Anyway, this is off-topic but my point is, the first step to make something a community effort is to have docs, at least a devs doc? We shouldnā€™t expect people to find out how the package works by trial and errors AND write docs for a package thatā€™s not even theirs and they donā€™t have a say in most decisions.

5 Likes

I think there is a difference between the main design and the ā€œsmall stepsā€ like contributing to the documentation (maybe just the doc strings would already improve here). So my opinion here seems to be different from yours ā€“ which is fine of course :slight_smile: I think doc-strings and using parts of a package can also be documented by the community. And sure we got a little off-topic here, sorry to the rest for that.

2 Likes

Should (does?) Pluto.jl have a way to save an optional Manifest file (Iā€™m not sure if itā€™s always saved with Project or neither)? I mean, to you it seems like an overkill to have the file, and to others it seems like a very good feature to have it by default. I see no downside to include it anyway as an optional feature, and do see value (your view) in having all the latest versions of packages (e.g. for speed), or at least guaranteed to be at least the same or later then in the Manifest file. Then if things seem off you could click on some ā€œreproducibilityā€ button to enable the Manifest file (or some users could have it set by default for themselves).

Last year I did basically this, and shipped the class notebooks with system images that were specified in an Artifacts file

Then itā€™s for e.g. x86_64, and not e.g. ARM (or WebAssembly, that and Shiny for Python, Shinylive is what we will compete with now, and itā€™s fast). Or they/Artifacts support ā€œfat binariesā€/multi-arch for sysimages? Or could you then have different sysimages for different archs to save on download, and if none supported for your platform does Pluto default to no sysimage?

1 Like

As of now, if you use packages and donā€™t disable the pkg manager a manifest is embedded in the notebook together with the project and itā€™s always used (itā€™s not optional)

2 Likes

Yes, I created sysimages for the three most common architectures that students use. For an unknown architecture (less than 5% of students), it doesnā€™t download a sysimage. These students can use the Pluto notebooks of the course without a sysimage or create one for their architecture by running MLCourse.create_sysimage().

6 Likes

I think that a scenario when you share something to a one-time user that prefers to avoid full precompilation because only a simple script needs to be run should be also on our radar.

An idea I had in
this PR comment is that maybe we could have ENV entry that would allow packages to get a signal if precompilation is desired or not. Would something like this make sense?

2 Likes

In the long run, the way to handle this will probably be to run things in an interpreter while code compiles in the background. That way you get the benefit of low latency from the interpreter, but still get the benefit of compilation for longer-running tasks.

But I think this is straying from the topic of Pluto notebooks. Personally I would like the option to run all Pluto notebooks within a given environment, rather than having an environment per notebook, so that I can instantiate a single environment for a project with several notebooks, and have the option to run an existing notebook within a specified environment rather than the one cached in the notebook.

16 Likes

Since the Pluto issue is just an instance of the more general issue of working with separate environments combined with high release frequencies of packages (both good things), I wonder what others think about this proposal to add already installed packages when possible:

2 Likes

The issue is not specific to Pluto in any way, itā€™s just a single instance. Everyone who uses environments for many projects/analyses/scripts is affected, because even very similar environments created on different dates differ a lot, and require recompilation ā€” see my quantification of this above.

Butā€¦ Thatā€™s already supported? šŸŽ Package management Ā· fonsp/Pluto.jl Wiki Ā· GitHub

Itā€™s not ā€œcacheā€, itā€™s just a regular environment. Different projects/analyses need different envs anyway so that they could be reproduced later. Otherwise, with shared env, updating/adding a package for AnalysisA can silently break AnalysisB.

5 Likes

That sounds like Pkg.offline.

1 Like

Great, thanks, that is indeed a big part of it. So more or less what Iā€™m proposing is to default to an offline add, it that fails, do an online add, effectively this:

julia> Pkg.offline(true)

(jl_89Sowl) [offline] pkg> add Example
   Resolving package versions...
ERROR: Unsatisfiable requirements detected for package Example [7876af07]:
 Example [7876af07] log:
 ā”œā”€Example [7876af07] has no known versions!
 ā””ā”€restricted to versions * by an explicit requirement ā€” no versions left

julia> Pkg.offline(false)

(jl_89Sowl) pkg> add Example
   Resolving package versions...
   Installed Example ā”€ v0.5.3
    Updating `C:\Users\visser_mn\AppData\Local\Temp\jl_89Sowl\Project.toml`
  [7876af07] + Example v0.5.3
    Updating `C:\Users\visser_mn\AppData\Local\Temp\jl_89Sowl\Manifest.toml`
  [7876af07] + Example v0.5.3
Precompiling environment...
  1 dependency successfully precompiled in 1 seconds. 33 already precompiled.
2 Likes

The Project.toml and Manifest.toml stuff works exactly as many like. I tell students to instantiate the manifest once to reproduce an enviornment - with the expectation set with them that the instantiation process might be slow as with all other package system like conda/etc. or even a little longer because it needs to do more high-performance compilation - but it is a one-time thing.

But that is a fixed cost, and afterwards it is blindingly fast these days. Students are told not to add in packages to the shared notebook environment haphazardly, but they rarely would anyways. No package operations ever occur unless manually triggered.

Are you sure this isnā€™t Pluto workflow specific? I havenā€™t encountered it in either jupyter, VSCode or antyhing else for a long time. If anything reinstalls or rebuilds it is because of a decision I made.

A key feature of Project/Manifest setup is to avoid that sort of thing. The speed of package evolution should be irrelevant since you want a reproducible snapshot. If you arenā€™t using shared manifests for each project (or set of lecture notes) then I can understand, but that is missing out on an amazing feature. Or maybe the feature already exists in Pluto but for whatever reason people arenā€™t using it in that way?

1 Like

Perhaps the quick add via using CSV (and answering ā€˜yā€™) should do this, if possible? (And Pluto could follow that.)

While ] add CSV is more deliberate, and could behave as now.

The recommended approach is in PSA for SnoopPrecompile: turning off extra workload for specific packages, and ENV is discouraged.

2 Likes

Thatā€™s possible via Pkg.activate(@__DIR__).

If you also want to run all notebooks in the same process, then, although not recommended, you can set use_distributed=false.

1 Like

Sure, if you have a single environment that you never modify ā€” no recompilation happens. That doesnā€™t depend on repl/vscode/pluto at all.

Itā€™s exactly the same in any context, including Pluto.
Nothing recompiles if you donā€™t do any package operations.
But consider the following simplified scenario (unrelated to teaching/learning). Every week or so, you want to perform some kind of a new analysis that needs some packages and writing some code.
Naturally, you want to:

  • be able to reproduce these analyses in the future after a few years,
  • and modify one of them independently in the future without breaking others.

So, following totally sensible Julia recommendations, you create new environments for these projects. Package sets are often similar with some differences here and there. But still, all of these envs require a long precompilation:

  • when you start a new analysis, because latest package versions changed since last week,
  • when you update julia/run them on a difference machine/compiled cache gets cleaned-up (it cannot store tens of Gb for all different versions forever).

This makes 1.9 slower in the (arguably common) scenario when lots of small projects/analyses only see a couple of executions. Like, I play with some dataset, make some plots ā€” turns out results arenā€™t useful for now. In a year, it becomes relevant, so I load the same env + code, everything reproduces exactly as before, I change something and produce a few plots. The end. I paid two recompilations for two code executions (more exactly, two executions in different julia sessions ā€“ those without julia restarts donā€™t count anyway).

5 Likes