Building a PC optimized for "time to first plot"

pepijndevos · October 16, 2022, 12:05pm

It would really improve my experience with Julia if loading a bunch of big package ecosystems wouldn’t cause a considerable delay while a bunch of stuff gets recompiled. Revise.jl is great but sometimes doesn’t work when for example redefining a struct or whatever. Obviously a faster PC is going to be faster at doing all the stuff that happens when you load a package.

Question is, if you were to build a PC, what do you want to actually optimize for? Is single core performance actually the only relevant metric? Are generic CPU benchmarks representative of this use?

AIUI precompilation can be multithreaded but when you actually load a package, that’s single threaded. I would think things like memory size and GPU are completely irrelevant. Is stuff like memory bandwidth, disk speed, cache size, and parameters like that at all relevant to consider beyond generic CPU benchmark scores?

I kinda like the Apple silicon, and AMD, but last I heard Intel is king of single core performance, so maybe that’s the way to go. A laptop is of course also convenient but if a desktop just has better performance that’s the way to go.

Krastanov · October 16, 2022, 2:32pm

I am not really answering your question, but I wanted to provide an alternative that is free and works well.

If you use vscode with the julia plugin it is now very easy to make a custom sysimage. A sysimage is something like the product of starting julia, running all your imports and simple initial tasks, and then saving the state of the julia interpreter to your hard drive. Then instead of starting julia from scratch, you resume that saved state.

On my computer, it took me 20 minutes to compile a sysimage from scratch. “Time to first task” for my usecase was 1 minute without the sysimage. Now it is 2 or 3 seconds.

To use a sysimage it really helps if you already organize your work with Project.toml files (you should be doing that anyway).

Your JuliaSysimage.toml configuration file can be as simple as

[sysimage]
exclude=[]
statement_files=[]
execution_files=[]

However, it does help to put a couple of demo small tasks in a file listed in execution_files, as that ensures that whatever you care about actually gets compiled.

ChrisRackauckas · October 16, 2022, 3:49pm

You don’t need a beefy PC. Just follow the steps of How Julia ODE Solve Compile Time Was Reduced From 30 Seconds to 0.1 and then whatever package you’re using will run in almost no time.

If you have a specific workflow, you can generate a MyWorkflow package, and just slap a SnoopPrecompile statement over what you want to be running in there, then sysimage that, and it’ll be all compiled at startup. The 0.1 seconds for first solves in that blog post were on a laptop PC with an Atom-type chip, not even an i3, so really any use case you want can have a pretty quick startup if you just SnoopPrecompile it. You just have to spend about 5 minutes setting it up, which apparently no one does for I don’t know why.

Krastanov · October 16, 2022, 3:57pm

This is a bit unfair. Setting this up is not that trivial, compared to other languages, especially when working with new students. Sure, it is 3 or 4 steps that are trivial for me, but they require that you have learnt the julia idioms in order to understand the steps (Package.toml, sysimage, module, precompile, SnoopPrecompile, etc). You can blindly follow the steps without knowing the idioms, but if that is the approach a student takes, at the first innocuous error message or typo they will crash and burn.

It is not the process that is difficult, it is wrapping your head around why the process is necessary that is the largest stumbling block for people new to julia. We do need better tooling before we can be so dismissive (and I know that you and many other volunteers are making this happen and I immensely appreciate it – the progress over the last couple of years has been monumental).

ChrisRackauckas · October 16, 2022, 4:08pm

That’s a bit unfair. With the new SnoopPrecompile mechanism, it’s fairly trivial.

Step 1: PkgTemplates:
Step 2: Change the module file to:

module MyLocalModule

import SnoopPrecompile
import # other packages

SnoopPrecompile.@precompile_all_calls begin
  # put code here
end

end

Step 3: ]add packages.

Step 4:

using PackageCompiler
PackageCompiler.create_sysimage(["MyLocalModule"]; sysimage_path="GiveItAName.so")

You’re done. It takes less than 5 minutes (of “you time”, the sysimage build can take longer). I do this on the daily. Training undergrads is sufficiently done just by sending a youtube video (https://www.youtube.com/watch?v=QVmU29rCjaA). I mean, there are things to improve but let’s not act like this piece is hard.

It could be made into one click in VS Code (and that would be nice), but for now, someone who’s an experienced Julia developer (like the OP) can definitely do those few steps I mentioned.

Krastanov · October 16, 2022, 4:13pm

I work with talented undergrads at the same institution as you. This statement is just not true. I, from the start, agree with you that this is trivial for developers with a couple of weeks of experience in Julia, but in my first post I was talking about students. Fair point if that is our point of disconnect, as that was my mindset when I complained about your “which apparantely no one does” comment.

And let me reiterate: the steps are trivial, but if the student is supposed to not crash and burn at the first typo, they need to understand the reasons for the steps. That requires learning quite a few new idioms (which is certainly worthwhile and they should do it, but it is not trivial).

mcabbott · October 16, 2022, 4:18pm

Taking “time to first plot” literally, has anyone done snooping for Plots, or would that not work?

Or taking the question as being about hardware, an M1 mac gives these times (Julia nightly, native, Plots v1.35.3, no system image or any other special tricks). Might be interesting to see other systems, my impression is that apple silicon is pretty good for this:

% julia --startup-file=no -e '@time using Plots; @time display(plot(rand(20))); exit()'
  3.658917 seconds (6.30 M allocations: 435.703 MiB, 5.13% gc time, 19.33% compilation time: 54% of which was recompilation)
  3.703820 seconds (403.54 k allocations: 20.986 MiB, 99.42% compilation time: <1% of which was recompilation)

ChrisRackauckas · October 16, 2022, 4:18pm

Well I’ve found by homework 1 the undergrads are fine turning in problems as a tested package, so . But anyways, the OP is an experienced Julia developer who was with Julia Computing for quite some time, so that’s a bit of a shift in the topic. I think someone who has been developing Julia packages at Julia Computing should, with 2022 tooling, be able to comfortably build a system image.

Well I think in 2022, every package should be properly SnoopPrecompiled and everyone should be building a system image which includes calls that they use.

Indeed, in earlier times, system images didn’t do very much because packages were not setup to work well with precompilation (nothing was snooped, and pre-v1.8 changes that are detailed in the blog post). But if packages are setup properly, then you don’t even need to give a custom script and it will precompile what you need. With that in place, people should really start using system images more often.

For example, if you don’t develop static arrays, slap it into the system image. If you don’t develop the ODE solvers, slap OrdinaryDiffEq.jl into the system image. For most developers, a large part of their dependency stack can be relatively static.

Krastanov · October 16, 2022, 4:21pm

I am not sure whether I am interpreting your question correctly, but I think the answer is yes. Here is an example with Makie:

With a sysimage that took just a couple of clicks to prepare in vscode:

julia> @time @eval using CairoMakie
  0.008680 seconds (14.54 k allocations: 927.412 KiB)

julia> @time plot([1,2,3])
  2.082096 seconds (1.94 M allocations: 130.274 MiB, 2.15% gc time, 37.41% compilation time)

Without a custom sysimage:

julia> @time @eval using CairoMakie
 13.082514 seconds (17.49 M allocations: 1.247 GiB, 4.57% gc time, 3.01% compilation time: <1% of which was recompilation)

# don't even bother actually making the first plot, it takes more than 30 seconds

ChrisRackauckas · October 16, 2022, 4:21pm

Plots has a lot of dynamism so with current precompilation a lot still is dropped (though IIUC there is work going in on that), but it is already snooped:

https://github.com/JuliaPlots/Plots.jl/blob/master/src/precompilation.jl

Because it’s snooped, if you do:

using PackageCompiler
PackageCompiler.create_sysimage(["Plots"]; sysimage_path="FastPlot.so")

then that system image is sufficient for first time GR plots to be quite fast (<0.1 seconds IIRC). Creating a sysimage for fast plotting with Plots.jl · PackageCompiler showcases a custom precompile file and all of that, but after the snooping was added it’s kind of not necessary.

mcabbott · October 16, 2022, 4:27pm

Ah great, I didn’t know.

FWIW, I personally gave up trying to make custom images, as I seem to seldom do the same thing enough times in a row that it helped more than it hurt. It’s one more kind of state to keep track of, which system images have I got & what packages are in which. Maybe if I used vscode it would be more automatic.

Krastanov · October 16, 2022, 4:28pm

For the same reasons as you, I was finding the custom images not worthwhile, but as alluded to in this thread:

julia 1.8 makes way better custom images to begin with
today most packages do have useful precompile statements
vscode does make it much more trivial, if you have a well maintained Profile.toml and Manifest.toml (and warns you about outdated states)

Now I love custom sysimages.

And still more work is being done for 1.9 and 1.10 in order to make compiled code cache better even without custom sysimages (look on Julia’s github for pkgimages and the work by Tim Holy and Valentin Churavy)

ChrisRackauckas · October 16, 2022, 4:35pm

I gave up for years because you needed to have a custom script and it would only be good for the things in the custom script. But indeed, my point is that I’ve come back to it since August. With the way that packages are setup now, it’s pretty sufficient for any decently maintained package to be snooped (not all of them are yet, but it keeps getting better), and to get a pretty good system image with a snooped package only requires mentioning it by name.

So in 2019, custom system images were not a solution. But in 2023, I think we should really be revisiting system images as something as simple as just taking the default compilation on StaticArrays + LoopVectorization + Plots can cut out >90% of the compile and load time for some of the workflows I have. Since those packages are essentially static (pun intended), I basically keep alive a “set it and forget it” system image nowadays.

VS Code’s extra piece makes this pretty trivial. Honestly the only thing that I think we’re missing is some functionality that tells you when to update the system image (because one of the packages in the system image updated), with a little popup like how updating the extension looks. That and the fact that system image building can be way faster if Faster incremental sysimg rebuilds by Keno · Pull Request #40414 · JuliaLang/julia · GitHub gets finished. Once that is finished and merged, it wouldn’t be unreasonable for a Plots-only system image update to take around 1 minute, in which case I would really be questioning why if it’s not used.

And this might make system images a thing of the past, but that’s only better.

Palli · October 16, 2022, 5:27pm

When a bunch of stuff gets recompiled, e.g. when you install a package and frequently additional packages get (upgraded or installed or) downgraded (which you can prevent with add -preserve=all), then you can stop it with CTRL-C.

Yes, you are just deferring compilation, but often you get away with it indefinitely. At least I’m recompiling packages I might never use. The problem, as I see it, with Julia compilation is that it’s done too eagerly, and optimizing too much.

I proposed -O0 to be the default for Julia (it wasn’t wanted, I waned to opt into more compilation selectively; another option, recompile to optimize more at runtime). Julia blurs the line between development/prototyping and production, and I believe most are using the default options that are (only) better for production, where you want fastest runtime speed. For development you want fast compile times, and no inlining (–inline=no likely implied by -O0, though I’m not sure).

If your problem is “time to first plot” (not to X") then different packages have very good startup, and now this one claims “Fastest time-to-first-plot in Julia!”:

4 dependencies successfully precompiled in 3 seconds. 352 already precompiled. 2 skipped during auto due to previous errors.

julia> @time using PlotlyLight
  0.511248 seconds (491.38 k allocations: 27.544 MiB, 9.57% gc time, 79.77% compilation time)

No, you have it backwards, Apple’s ARM is king of single-threaded (and sort of recently IBM’s mainframe claimed fastest single-thread, before Apple did), last time I checked, and AMD for multithreaded.

You will have more problems in the short run with Apple’s ARM (people still run into trouble with code not compiled for it), with it a tier 2 platform (though with Rosetta it’s tier 1, and might be good enough, but then not as fast).

Palli · October 16, 2022, 5:34pm

GPUs are irrelevant to compilation, but precomiling can eat up memory with many julia processes forked. Each one is single-threaded, but with many used at one, many “threads” are used, also multiplying memory by N. People have run out of memory in (2 GB) containers, the issue was supposed to be slowed, but seems it isn’t. There’s a workaround, you can force less or no parallel precompilation.

Ideally you would only precompile packages (or the methods of them, such as exported functions) you are likely to use in the (near) future, e.g. just the package you were just installing (and maybe some of its dependencies), often something unrelated gets changed and precompiled too.

Ahmed_Salih · October 16, 2022, 5:41pm

Thank you for this! I did not know

Palli · October 16, 2022, 5:43pm

For Plots.jl or Makie.jl or other (non-plot code) this might help:

to enable PyPlot showing properly after it was baked into the [system] image add

I was looking for another project which already has some custom system images pre-built for you (e.g. for Makie.jl if I recall and JuliaSyntax.jl). It wasn’t yet announced last time I know, just be aware that there’s a third project, but that top one seems similar.

aplavin · October 16, 2022, 5:49pm

I consider myself reasonably experienced with julia, but never used system images. Using them doesn’t look as simple as sometimes portrayed, and I just don’t see any obvious little-maintenance way to incorporate them. For example, I do most of my interactive work in pluto notebooks, and would be glad to cut notebook startup time with some compilation. Are there any tutorials for this presumably common usecase?

ufechner7 · October 16, 2022, 6:01pm

Not about Pluto, but the way to do it should be mostly the same: GitHub - ufechner7/Plotting: Demo for using InspectDR

Krastanov · October 16, 2022, 6:10pm

Each Pluto notebook has its own Project.toml and Manifest.toml file. I am not sure a sysimage would be a particularly pleasant experience in such a case. You really want the sysimage you are running to be matched to the Manifest of the project.

Topic		Replies	Views
Taking TTFX seriously: Can we make common packages faster to load and use Performance ttfp	125	11655	June 20, 2022
Slow Julia startup time after sysimage creation (and an unbelievable observation!) Tooling repl , startup , sysimage	26	3285	June 24, 2021
Very slow time to first plot, 2022 General Usage ttfp	72	4538	May 8, 2023
Roadmap for a faster time-to-first-plot? Internals & Design ttfp	251	32069	August 3, 2021
Julia startup speed cut in half. Was: (Unofficial) Julia 1.9 for lower latency (startup) Performance	17	4297	September 30, 2022

Building a PC optimized for "time to first plot"

Related topics