Startup time of 1000 packages – 53% slower in Julia 1.12 vs 1.10

Perhaps each snippet could focus on a particular task, like:

  • computing a gradient (I have a simple piece of code which shows that time-to-first-gradient with ForwardDiff is 60% worse in 1.12 compared to 1.11);
  • plotting a simple function like Plots.plot(range(-5,5,100), sin);
  • minimizing a simple function using plain-Julia minimization algorithms like those in Optim.jl;
  • solving a system of linear equations using plain-Julia algorithms in 3rd-party packages like LinearSolve.jl;
  • fitting a simple normal distribution using MCMC with Turing.jl.

The goal with “plain-Julia” is to avoid calling into precompiled C/C++/FORTRAN code.

2 Likes

Hmmm, that’s a good point. I was just hoping that in the spirit of keeping things as simple as they can be that we’d be able to get by with something like:

# --- snippet --
# julia: 1.7
# mypkg: 0.3
# author: @name

MyPkg.dothing()

# --- snippet ---
# julia: 1.9
# mypkg: 0.4
# author: @someone

MyPkg.anotherthing()

# ...

this would make it possible to simply to have a single M/MyPkg.jl file for each package.

I guess we could do M/MyPkg/<task>/{Project.toml,task.jl} instead.

I appreciate the concept but it seems to me that Project.toml is already capable ot storing package names and versions, julia versions and authors. From there, it seems almost more complicated to invent a new format?

1 Like

+1, exactly. For most packages for which people care about their TTFX they already load so many dependencies upon using I am not sure if there is a practical reason to focus on the package instead of the task, provided the task is central to a package and is done with that package.

2 Likes

Pah, that’s not a format, it’s just a few special comments :face_with_tongue:

That said, I’m taking from this discussion that orienting around tasks with a “main package” (not necessarily the only package) is a better approach.

2 Likes

Update: I’ve got something coming, I’ll probably have something to share in the next few days.

18 Likes

I think a Manifest is pretty much required here if you want this information to be useful.

New versions of packages can do things like add precompilation workflows, or add large amounts of code which could substantially change latency / ttfx in a non-breaking way.

If the dependencies of a package are not fixed, you risk a huge amount of noise / systematic error being folded into your measurements, and you make your conclusions less reproducible.

3 Likes

Depends on what “this information” is. With GitHub CI being on shared machines, I don’t think it’s feasible to get good quality benchmark results from it.

So, this set of trial workloads will be have to be run by people wanting to do benchmarks. Depending on what you’re trying to measure though, different resolution strategies make sense:

  • If you want to compare what people experienced at the time points of each Julia version, you’d want to resolve with registries of the same year as each Julia release.
  • If you want to compare Julia itself, then you might want to try to use the same code to the greatest extent possible, and construct the manifest from the lowest supported Julia version and just re-resolve (not upgrade) with newer versions.
  • When trying to compare Julia versions, you might not care about Julia versions before a certain point (say 1.6), in which case you’d want to initially resolve with version max(1.6, min-ver).

So, I don’t think there’s a single clear answer to what the Manifest should be, particularly if the repo itself isn’t benchmarking.

Despite that, simply having a set of tasks + minimum Julia versions like this is still hugely valuable as it allows us to start looking at these questions.

1 Like

Download speed can vary dramatically depending on a user’s internet service. In my experience this can completely dwarf precompile/load times on some internet connections.

I would find it useful if the code/notebook calculated the size of these packages (including their full dependency tree). A potentially reasonably simple knob to add to the notebook is a few options for what your internet speed is [e.g. 1Mbit/s, 10Mbit/s, 100Mbit/s, 1000Mbit/s]. Then the plot could calculate [package (+deps) size] / [download speed] and use that as its Installation time measure.

2 Likes

It’s not everything I want it to be, but it’s a good starting point I hope!

Do check it out and let me know what you think :smiley:

15 Likes

Just to elaborate on how this works, I’ve tried to make something easy enough that you can do it on your phone.

The prominent link on the README will just ask you to provide a code snippet for a package, e.g.

That’s it! No need to clone the repo, etc.

The minimum Julia version will be automatically determined and a PR created. If If you’re a maintainer of the package (author or member of parent org) in question, it will also be merged, e.g. New task: HiddenMarkovModels, Estimate HMM with Baum-Welch by github-actions[bot] · Pull Request #103 · tecosaur/Julia-TTFX-Snippets · GitHub

Note that tasks are run without network access and should not create any files outside of temp/cache directories.

18 Likes