I’m working on a package, that I want to benchmark at every commit, because I want to see how the performance evolves over time.
A short note on this idea
I assume that it’s not the best workflow, would be better to run the benchmarks at every PR, but the package is still in an early stage (not yet ready for publication), and I’m working on it alone, so running the benchmarks commit by commit seems fine for now.
I’d like to run the same benchmark for every commit. This doesn’t have to be “real time”, as I run this while I’m coding, but want to have a consistent measurement. I want to store the results somewhere (in git repo in a csv file or something like that). I want to do it free.
AFAIK I can’t use GitHub’s and GitLab’s CI, because the jobs can run on any machine (which kills the goal of consistency).
I have access to a Linux VM running in our institution’s cloud (I’m waiting for confirmation that the hardware is not changing). I would write a script that clone’s the package and runs the benchmark. It’s ok if I have to start it manually (e.g. at the end of day), but it should traverse through all the commits (on #master) that haven’t been benchmarked yet. Then it should commit and push the results to a benchmark repo (generate markdown/html based on the csv and publish on gh/gl pages, etc.).
(Other solution I thought of is installing a GitLab server for myself, but that doesn’t sound like an easy way to do this.)
What do you think? Is it reasonable? Any other way to do this?
How to traverse through commits? (That is my main question.)
FYI, I setup benchmarks on Travis CI and run a couple of benchmarks: https://tkf.github.io/TransducersBenchmarksReports.jl/stable/. I was worried that it may fluctuate a lot. But it turned out Travis is consistent enough for my need. This is because I only care about the performance relative to the baseline implementation I write.
Great to see you got PkgBenchmarks working with Travis CI! Only quickly looking over your CI benchmark scripts in Transducers.jl, TransducersBenchmarksReports.jl, and Run.jl it doesn’t seem trivial to set it up, though. I’d really appreciate a couple of minimal examples for this.
I’m not doing anything complicated actually. I just use benchmarkpkg to run the benchmark, use readresults to load it, use export_markdown to create a markdown file, and then use Documenter to generate the github page. I’m using Run.script to set up a project automatically before running the script, but you can replace it with Pkg.instantiate and Pkg.activate. Note that the CI for Transducers.jl is not related (even though I run benchmark there as a smoke test).
(A bit more sophisticated approach would be to use judge w.r.t. the reference revision (say the last release). But I haven’t gotten there yet.)
We use the same setup and it’s really easy to implement. The nice thing is: you can register a runner on a dedicated machine where you explicitly limit the number of parallel instances to one to avoid resource clashes.
Finally I wrote a script that goes through all the commits and benchmarks the code with PkgBenchmark. That’s an awesome package, thank you for it! (Although the docs are outdated, I addressed that in a PR).