[ANN] AirspeedVelocity.jl - easily benchmark Julia packages over their lifetime

AirspeedVelocity.jl

AirspeedVelocity.jl tries to make it easier to benchmark Julia packages over their lifetime.
It is inspired by, and takes its name from, asv, (and aspires to one day have as nice a UI).

Basically, think of it as PkgBenchmarks.jl, but higher level. There are more built-in features, but it is also more rigid. AirspeedVelocity.jl started as a simple script I made to try to visualize the longterm performance evolution of my own packages but I thought it might be useful to others as well.

This package allows you to:

  • Generate benchmarks directly from the terminal with an easy-to-use CLI (extremely handy when working side-by-side with git)
  • Compare several commits/tags/branches at once.
  • Plot generated benchmarks over time, with an automatically flattening of a hierarchical BenchmarkGroup suite into a list of plots with sensible subtitles.
  • Includes an example GitHub action to generate benchmark comparisons for every submitted PR in a bot comment (table + plot).

This package also allows you to freeze a benchmark script at a particular revision, so there is no worry about using an older script. It also makes a PACKAGE_VERSION variable available for use in the benchmark, so you can switch to an older API within your script as needed.

Installation

You can install the CLI with:

julia -e 'using Pkg; Pkg.add("AirspeedVelocity"); Pkg.build("AirspeedVelocity")'

This will install two executables at ~/.julia/bin. Make sure to have it on your PATH.

Examples

You may then use the CLI to generate benchmarks for any package that has a benchmark/benchmarks.jl` script:

benchpkg Transducers \
    --rev=v0.4.20,v0.4.70,master \
    --bench-on=v0.4.20

which will benchmark Transducers.jl, at the revisions v0.4.20, v0.4.70, and master, using the benchmark script benchmark/benchmarks.jl as it was defined at v0.4.20, and then save the JSON results in the current directory. The only requirement is that this script defines a SUITE::BenchmarkGroup (also used by PkgBenchmark.jl).

After this is finished, we can generate plots of the revisions with:

benchpkgplot Transducers \
    --rev=v0.4.20,v0.4.70,master \
    --format=pdf \
    --npart=5

which will generate a pdf file for each set of 5 plots,
showing the change with each revision:

There are a lot of other options - I’ll list those below. First, another feature I am excited about using for my own packages:

Using in CI

You can use this package in GitHub actions to benchmark every submitted PR, by copying the example configuration: .github/workflows/benchmark_pr.yml.

For every PR, or PR update, this workflow will run and generate plots of the performance of the PR against the default branch, as well as a markdown table (pasted into the comment), showing whether the PR improves or worsens performance:

Usage

There are many other options for this CLI, which I give below. For running benchmarks, you can use the benchpkg command, which is built into the ~/.julia/bin folder:

    benchpkg package_name [-r --rev <arg>] [-o, --output-dir <arg>]
                          [-s, --script <arg>] [-e, --exeflags <arg>]
                          [-a, --add <arg>] [--tune]
                          [--url <arg>] [--path <arg>]
                          [--bench-on <arg>]

Benchmark a package over a set of revisions.

# Arguments

- `package_name`: Name of the package.

# Options

- `-r, --rev <arg>`: Revisions to test (delimit by comma).
- `-o, --output-dir <arg>`: Where to save the JSON results.
- `-s, --script <arg>`: The benchmark script. Default: `benchmark/benchmarks.jl` downloaded from `stable`.
- `-e, --exeflags <arg>`: CLI flags for Julia (default: none).
- `-a, --add <arg>`: Extra packages needed (delimit by comma).
- `--url <arg>`: URL of the package.
- `--path <arg>`: Path of the package.
- `--bench-on <arg>`: If the script is not set, this specifies the revision at which
  to download `benchmark/benchmarks.jl` from the package.

# Flags

- `--tune`: Whether to run benchmarks with tuning (default: false).

This will generate some JSON files at the output-dir (default is current dir). For plotting, you can use the benchpkgplot function which will read in the same format:

    benchpkgplot package_name [-r --rev <arg>] [-i --input-dir <arg>]
                              [-o --output-dir <arg>] [-n --npart <arg>]
                              [--format <arg>]

Plot the benchmarks of a package as created with `benchpkg`.

# Arguments

- `package_name`: Name of the package.

# Options

- `-r, --rev <arg>`: Revisions to test (delimit by comma).
- `-i, --input-dir <arg>`: Where the JSON results were saved (default: ".").
- `-o, --output-dir <arg>`: Where to save the plots results (default: ".").
- `-n, --npart <arg>`: Max number of plots per page (default: 10).
- `--format <arg>`: File type to save the plots as (default: "png").

If you prefer to use the Julia REPL, you can use the benchmark function for generating data. The API is given here. (Although if you might just consider using PkgBenchmark.jl if wanting to customize things).

Other notes

Non-stdlib dependencies include the following awesome packages:

Thank you!

I am interested in hearing people’s thoughts and feedback. Package contributions are very welcome!
Cheers,
Miles

33 Likes

This is absolutely wonderful! Thank you so much for releasing it.

A ton of shameless feature requests/suggestions:

  • How difficult would it be to also add a flag for custom julia version. E.g. thanks to juliaup I can do the following julia +1.8.2 [other arguments] and the Julia 1.8.2 executable will be used.
  • What about TTL (time to load) measurements, i.e. benchmarking how long it takes to run using MyPackage?
  • What about TTFX (time to first X, not including loading time) measurements, i.e. benchmarking do_representative_workload()?
  • What about compilation time, i.e. measuring how long ] precompile takes for an environment in which only the package is installed. I imagine this might be a bit more difficult as it requires deleting the entire cache. Maybe it can be done by pointing JULIA_DEPOT to a temporary directory.
  • Can we customize the number of threads for each of the above?
  • Maybe using time to measure CPU load and peak memory usage?
  • Storing the resulting data with a bunch of extra metadata, e.g. the output of ] st and versioninfo()?

I have a variety of messy scripts made for personal use that do something like the above, but your package would be a much neater way to do all this.

1 Like

I’m happy to hear that!! Not sure which ones could be done but I will think more. Responses below:

Do you mean to have different Julia versions included as a matrix dimension in the benchmark? That could be interesting. It definitely is doable, as I’m just launching a separate Julia process entirely (I started with addprocs, but it became too complicated, so now I just start a new Julia for each benchmark).

I think some of these might be doable by defining custom scripts with -s and a BenchmarkGroup for each? But indeed it might be nice to include TTL as a default measurement that gets concatenated with the user-defined BenchmarkGroup.

You can pass --exeflags to the CLI, but it assumes there is only one per run.

Another option for all of these is to manually call benchmark: API · AirspeedVelocity.jl. You could loop over different options for the exeflags, and store it in an larger BenchmarkGroup.

These sound like good things to add. I’m not sure where to start though. Contributions welcome!

1 Like

Yes!

Also, yet another suggestion: If you benchmark an old version of some package, you might be pulling in a recent version of a dependency that significantly improved in performance (dependence on Polyester has led to that effect in my benchmarks). That is still a valid benchmark but it benchmarks “old package version on current julia ecosystem”. I frequently need a mode that is “old package version on a contemporary version of julia’s ecosystem”. This can be done by pulling locally an older version of the Registry repository.

1 Like