Startup time of 1000 packages – 53% slower in Julia 1.12 vs 1.10

Perhaps each snippet could focus on a particular task, like:

  • computing a gradient (I have a simple piece of code which shows that time-to-first-gradient with ForwardDiff is 60% worse in 1.12 compared to 1.11);
  • plotting a simple function like Plots.plot(range(-5,5,100), sin);
  • minimizing a simple function using plain-Julia minimization algorithms like those in Optim.jl;
  • solving a system of linear equations using plain-Julia algorithms in 3rd-party packages like LinearSolve.jl;
  • fitting a simple normal distribution using MCMC with Turing.jl.

The goal with “plain-Julia” is to avoid calling into precompiled C/C++/FORTRAN code.

2 Likes

Hmmm, that’s a good point. I was just hoping that in the spirit of keeping things as simple as they can be that we’d be able to get by with something like:

# --- snippet --
# julia: 1.7
# mypkg: 0.3
# author: @name

MyPkg.dothing()

# --- snippet ---
# julia: 1.9
# mypkg: 0.4
# author: @someone

MyPkg.anotherthing()

# ...

this would make it possible to simply to have a single M/MyPkg.jl file for each package.

I guess we could do M/MyPkg/<task>/{Project.toml,task.jl} instead.

I appreciate the concept but it seems to me that Project.toml is already capable ot storing package names and versions, julia versions and authors. From there, it seems almost more complicated to invent a new format?

1 Like

+1, exactly. For most packages for which people care about their TTFX they already load so many dependencies upon using I am not sure if there is a practical reason to focus on the package instead of the task, provided the task is central to a package and is done with that package.

2 Likes

Pah, that’s not a format, it’s just a few special comments :face_with_tongue:

That said, I’m taking from this discussion that orienting around tasks with a “main package” (not necessarily the only package) is a better approach.

2 Likes

Update: I’ve got something coming, I’ll probably have something to share in the next few days.

18 Likes

I think a Manifest is pretty much required here if you want this information to be useful.

New versions of packages can do things like add precompilation workflows, or add large amounts of code which could substantially change latency / ttfx in a non-breaking way.

If the dependencies of a package are not fixed, you risk a huge amount of noise / systematic error being folded into your measurements, and you make your conclusions less reproducible.

3 Likes

Depends on what “this information” is. With GitHub CI being on shared machines, I don’t think it’s feasible to get good quality benchmark results from it.

So, this set of trial workloads will be have to be run by people wanting to do benchmarks. Depending on what you’re trying to measure though, different resolution strategies make sense:

  • If you want to compare what people experienced at the time points of each Julia version, you’d want to resolve with registries of the same year as each Julia release.
  • If you want to compare Julia itself, then you might want to try to use the same code to the greatest extent possible, and construct the manifest from the lowest supported Julia version and just re-resolve (not upgrade) with newer versions.
  • When trying to compare Julia versions, you might not care about Julia versions before a certain point (say 1.6), in which case you’d want to initially resolve with version max(1.6, min-ver).

So, I don’t think there’s a single clear answer to what the Manifest should be, particularly if the repo itself isn’t benchmarking.

Despite that, simply having a set of tasks + minimum Julia versions like this is still hugely valuable as it allows us to start looking at these questions.

1 Like

Download speed can vary dramatically depending on a user’s internet service. In my experience this can completely dwarf precompile/load times on some internet connections.

I would find it useful if the code/notebook calculated the size of these packages (including their full dependency tree). A potentially reasonably simple knob to add to the notebook is a few options for what your internet speed is [e.g. 1Mbit/s, 10Mbit/s, 100Mbit/s, 1000Mbit/s]. Then the plot could calculate [package (+deps) size] / [download speed] and use that as its Installation time measure.

2 Likes

It’s not everything I want it to be, but it’s a good starting point I hope!

Do check it out and let me know what you think :smiley:

17 Likes

Just to elaborate on how this works, I’ve tried to make something easy enough that you can do it on your phone.

The prominent link on the README will just ask you to provide a code snippet for a package, e.g.

That’s it! No need to clone the repo, etc.

The minimum Julia version will be automatically determined and a PR created. If If you’re a maintainer of the package (author or member of parent org) in question, it will also be merged, e.g. New task: HiddenMarkovModels, Estimate HMM with Baum-Welch by github-actions[bot] · Pull Request #103 · tecosaur/Julia-TTFX-Snippets · GitHub

Note that tasks are run without network access and should not create any files outside of temp/cache directories.

22 Likes

Any sign that this problem might be alleviate during the 1.12 release cycle?

The point is, startup time is not well defined, and we have no good, reproducible test cases for the pre-compile time.

The package load time of 1.12 is pretty good. The pre-compile time is not. I would open an issue if I had a good test case.

We’re getting them:

@Krastanov has been good enough to get the runner going, and has put together some plots:


Funnily enough just a few days ago Stefan and I spoke about making a more user-friendly way of tracking shift across Julia versions, and for a single package, across the different performance metrics.

7 Likes

Thanks for sharing these plots!

I find it a bit hard, though, to find the version I am looking for because some of the colors are very similar. And is the specification (OS, CPU, RAM, HD) of the runner known?

1 Like

That’s why mentioned that a more user-friendly way of presenting the data is on the todo list (please send help, I think my todo list is collapsing under the weight of my ambition) :slight_smile:

7 Likes

Agreed - it feels like this would make more sense as a box and whiskers plot with release numbers as x axis ticklabels.

1 Like

Is the data available as a CSV file or in a similar format? Then all of us could try to make improved plots.

2 Likes

I’m not familiar with what’s actually run: Stefan set it up not me.

However, all of the files involved in collecting data, saving it, and plotting it are in the root directory of GitHub - JuliaEcosystemBenchmarks/julia-ecosystem-benchmarks.

For instance, see ttfx_snippets_gather_data.jland make_and_commit_plots.sh. The basic understanding of the structure I have is that there’s a minimal container built (see: Dockerfile, basically debian 12 + juliaup), and then the .sh scripts are run, which in turn call the .jl scripts.

What I mentioned to Stefan a few days ago is the idea of committing a structured set of CSVs, making them accesssible via GET requests to a raw.githubcontent.com/... URL, and replacing the pre-generated plots with a basic interactive page where you can select what you’re comparing (and <insert javascript framework> will fetch and plot the data).

The data seems to be written to a different branch

2 Likes