A visual log of loading and precompilation times for many packages over a variety of julia versions

Krastanov · November 14, 2025, 7:27pm

At Julia Ecosystem Benchmarks Explorer you can find detailed benchmarks of loading time for many packages over many different versions of julia, all interactively explorable.

This is the culmination of:

@tecosaur setting up a crowdsourced repository of representative TTFX benchmarks
the establishment of a nightly benchmark run over the entries of the repository
and a quick vibecoding visualization session after discussions in this post

It will be updated daily. Here are a few examples:

I will not have the bandwidth to make improvements to it in the near future, but do not hesitate to submit patches.

kristoffer.carlsson · November 14, 2025, 7:38pm

Very cool!

I have a few questions:

Is it identical package versions for all the dates?
Did you not have trouble with More graceful behavior when bisecting julia · Issue #70 · JuliaLang/PrecompileTools.jl · GitHub when you did the version sweep?
How did you generate all the data in practice? Just built julia normally and ran the workload? Must have taken quite a bit of time?
Do you have any idea what happened around this time (This is CairoMakie precompilation time which went from 185 to 300 seconds)

image700×932 35.7 KB

Krastanov · November 14, 2025, 7:44pm

All good points, I should add this to the docs

No. For each historic date I check out the General registry to its state on that date, and set it as the source of truth in the active julia depot. So all version resolutions happen as if it was that date. I manually put bounds on which date corresponds to which julia version, so a julia version is never tested on a General registry state that did not exist during the julia version lifetime.

I use juliaup so I never manually bisected or compiled julia. There are many failures for many packages throughout this dataset, but they are just not visualized. I did not investigate why exactly the failures happen.

All the benchmarks run on a lab server in the back of my office. The historical data (about a 1000 General registry snapshots) took a week or two. Now every night another benchmark run is executed (for lts, nightly, alpha, and release, as defined by juliaup).

That particular server is used for other tasks as well, most of them not computationally intensive. But the benchmarks run at high process priority with reserved resources at night, so hopefully the noise is not too bad.

kristoffer.carlsson · November 14, 2025, 7:45pm

Aha, I thought the x-axis was a sweep of commits on the julia master branch, not checkout dates of the registry. Ok.

Krastanov · November 14, 2025, 8:00pm

Thankfully for any future entries under the “nightly” label, they will be both. It sounded too difficult to make that happen for historic entries too.

jules · November 14, 2025, 8:12pm

Interesting question how such data can be interpreted when packages get updated over time. You’d assume that new features might add loading time while authors should also try to battle TTFX. The daily nightly measurement that’s starting now should be useful to track behavior of Julia over time. But in the historic data it seems difficult to make assessments. I’ll have to play around a little.

jling · November 14, 2025, 8:17pm

it tracks the end-user experience – “if you used CairoMakie 1 year ago and thought it was fast, it’s slower now”

jules · November 14, 2025, 9:16pm

Sure but I’d want to know whose fault it is

jules · November 15, 2025, 7:01am

Here’s an alternative visualization. I thought that it might make sense to normalize because the packages and workloads are so different. So I took the median of a given metric within each package/workload/julia group, then rescaled those medians within each package group by dividing with the maximum value. So each package has values going up to 1 with the relative proportions between the values staying intact. The thick dots are then means over all those values for a given julia version, the small dots are the values making up the means.

There’s still considerable variation and I’m not sure if another way of normalization might be better, but at least it does reduce the impact of the different orders of magnitude of the workloads. Maybe scaling by a reference version like latest release would also work.

using CSV
using Chain
using CairoMakie
using AlgebraOfGraphics
using DataFrames
using DataFrameMacros
using StatsBase
using Downloads
using SwarmMakie # ]add SwarmMakie#jkrumbiegel-patch-1

##

df = @chain begin
    "https://raw.githubusercontent.com/JuliaEcosystemBenchmarks/julia-ecosystem-benchmarks/refs/heads/jeb_logs/data/Julia-TTFX-Snippets/ttfx_snippets_data.csv"
    Downloads.download
    CSV.read(DataFrame)
end

variables = ["precompile_time", "precompile_cpu", "precompile_resident", "loading_time", "task_time", "task_cpu", "task_resident"]

spec = sum(variables) do var
    stat = median
    stat_var = "$(stat)_$var"
    stat_var_norm = "$(stat_var)_normalized"
    stat_var_norm_mean = "$(stat_var_norm)_mean"
    @chain begin
        df
        @groupby :package_name :julia_version :task_name
        @combine stat_var = median({var})
        @subset :julia_version ∉ ("alpha", "release", "lts", "nightly")
        @groupby :package_name :task_name
        @transform stat_var_norm = @bycol {stat_var} ./ maximum({stat_var})
        @aside combined = @combine (@groupby _ :julia_version) mean({stat_var_norm})
        (
            data(combined) *
                (
                    mapping(:julia_version, stat_var_norm_mean => "") *
                    visual(Scatter)
                ) +
            data(_) *
                mapping(:julia_version, stat_var_norm => "normalized median") *
                visual(Beeswarm, markersize = 4, alpha = 0.5)
        ) *
            mapping(color = :julia_version => (s -> match(r"\d\.\d+", s).match) => "minor version") *
            mapping(layout = direct(var))
    end
end |> draw(; axis = (; width = 200, height = 200, xticklabelsvisible = false))

Krastanov · November 22, 2025, 6:59pm

If I understand your description correctly, the main reason I avoided this type of normalization is that package that support only recent julia versions make the plot incorrectly look like loading times are going up.

E.g. we have a package that supports 1.8 for which the averages are (1.0, 0.1, 0.2, 0.3, 0.3) and a package that supports 1.10 for which the averages are (missing, missing, 0.9, 1.0, 1.0), falsely pulling the summary averages way up.

jules · November 22, 2025, 7:06pm

Yeah I noticed that effect, too and this normalization is definitely a bit simplistic. Maybe zscoring would be better or something else entirely. But just taking averages let’s the large workloads dominate the rest

Topic		Replies	Views
Profiling module precompilation Performance	6	1318	February 18, 2020
Long startup time when loading Plots.jl General Usage	9	2968	December 24, 2016
Benchmark different Julia versions? General Usage question	0	384	September 8, 2017
Taking TTFX seriously: Can we make common packages faster to load and use Performance ttfp	125	12151	June 20, 2022
Compiler Performance Internals & Design	20	2130	June 2, 2018

A visual log of loading and precompilation times for many packages over a variety of julia versions

Related topics