AirspeedVelocity.jl now has a marketplace GitHub Action!
I’m excited to announce a marketplace GitHub Action for AirspeedVelocity.jl, making it very easy to measure benchmarks in pull requests to your Julia package. These show you the time AND memory changes, for all defined benchmarks, against your default branch. It even will track startup time for you.
Quickstart
You need to follow BenchmarkTools.jl formatting: define a file benchmark/benchmarks.jl (with an optional benchmark/Project.toml) that defines a SUITE:
using MyPackage: my_eval
const SUITE = BenchmarkGroup()
SUITE["my_eval"] = @benchmarkable my_eval(x) setup=(x=randn(100))
If you have done this, all you need to do now is add this workflow file to .github/workflows/benchmark.yml:
name: Benchmark this PR
on:
pull_request_target:
branches: [ master ] # or your default branch
permissions:
pull-requests: write # needed to post comments
jobs:
bench:
runs-on: ubuntu-latest
steps:
- uses: MilesCranmer/AirspeedVelocity.jl@action-v1
with:
julia-version: '1.10'
That’s it! Now every PR will include clear, collapsible benchmark reports directly in a GitHub comment:
^You can see the memory benchmarks include both allocations and bytes.
You can also benchmark over multiple Julia versions! Just use the normal strategy: matrix: ... approach. These versions will show up as separate comments in the thread:
The current action uses peter-evans/create-or-update-comment which tends to be good at updating existing comments, meaning it should only be 1 comment per PR (and no notifications after the 1st). I guess one downside is that multiple versions benchmarked => multiple comments. But haven’t worried about this much yet.
This looks cool! Do you think this could offer longitudinal records of package performance? I’m thinking a page like codecov does where you see the change commit-to-commit.
I’m also interested to hear how you work around the variance in GitHub runner performance? Do you run some sort of representative/calibration workload at the start and hope that the other tasks being juggled on the VM don’t change much while the benchmarked tasks are running?
Within a single job, I have found the Github runner performance to be decently consistent (actually more than my laptop, due to all the apps running).
There is, however, a decent amount of variability across jobs. Presumably they might be running on different machines.
So the benchmark just does the comparison within a single job, and this is what gets printed. There is no longterm tracking of performance statistics.
You can, however, do this locally. The --rev option let’s you pass any number of commits. And it saves the resultant benchmarks to files, so you could then plot performance over time as desired.
Very nice effort! As some more inspiration, for Makie, we’ve written some code that plots similar metrics, and it posts the summary plots to a gist because sadly, you cannot add images to comments programmatically, nor to workflow summaries. But it’s nicer than the table we had before, because visual outlier detection is much nicer than relying on summary stats, I think. And the noise can be considerable. Here’s a link to a random PR’s run:
Thanks! This is a brilliant idea. Perhaps we can auto-generate plots like these for the benchpkgplot command in AirspeedVelocity? If you’re interested and have some bandwidth, I’d love to have your help incorporating that idea!
At the moment, benchpkgplot (enabled in the GitHub action with enable-plots: 'true') just generates simple error bar plots like this:
and stores it to the build artifacts. But now I am embarrassed because it literally has access to the full table of times, and yet I never thought to plot the full distribution!
so hopefully shouldn’t be much effort to get this working.
I’d be happy to add other backends like Makie btw. PlotlyLight was chosen before extensions were a thing, in an effort to minimize build times, but now we can add or switch to other options.