I’m glad to announce that I finally delivered my most “I have no idea what I am doing” based code in Julia as a brand-new package (should be released within half a day).
PerfChecker.jl is a collection of semi-automated performance checking tools for packages that compare a package through time (releases). Did I mention it output plots? Did I mention my skills with Plots.jl are terrible and that I am sure many people can help?
The doc is currently empty (most of the content is available through the tuto), but I will write it down in the coming weeks after some feedback and fixes.
As for the link, I think the link to the stable version of the doc will work once the package is released (should happen soon, It has been 3 days since I registered it).
FYI, PerfChecker.jl is now at v0.1.1, so in a usable state! Any package developer should check it out.
I welcome help, as several aspects are currently out of my area of expertise. I am looking for wizards (or I have to become one, which is tempting but time-consuming)
For today, I wanted to share some of my goals regarding PerfChecker.jl and propose a tentative syntax for a v1. I would be delighted to have opinions and recommendations about it.
One goal of PerfChecker.jl is to have it execute performance checks in isolated environments (julia remote processes started with the relevant options). Ideally, I would like to have features similar to Test.jl.
However, unlike Test.jl, we would also need to execute perf checks in several versions of the package. Follows an attempt to propose a syntactic framework for such functionalities.
using PerfChecker
# Target(s)
using CompositionalNetworks
# Dependencies
using ConstraintDomains
# Start defining the performance checks
@checkset "Some title" begin
## Set compat ranges for Julia and targeted packages
## Each check will use the highest compatible version of Julia
# Define available julia versions and the respective executable paths
@juliaexe "1.3" "path/to/julia1.3.0.exe"
@juliaexe "1.6" "path/to/julia1.6.3.exe"
.
.
.
# Add target(s)
@target CompositionalNetworks "0.2" "0.3"
@notarget CompositionalNetworks "0.2.4"
# Some generic code used in the checks
domains = fill(domain([1, 2, 3]), 3)
## Triger compilation before checks. Note that code can be specified per version of the target(s)
f() = if version(CompositionalNetworks) ∈ v"0.2"
foreach(_ -> explore_learn_compose(domains, allunique, somearg), 1:10)
else
foreach((_ -> explore_learn_compose(domains, allunique), 1:10)
end
# Add compilation trigger for alloc check
@compile_alloc f
# Add compilation trigger for benchmark check (optional)
@compile_bench f
# Or add for both
@compile_check f
## Add checks
# alloc check
@alloc_check explore_learn_compose(domains, allunique)
# benchmark check (with optional keyargs similar to @benchmark)
@bench_check explore_learn_compose(domains, allunique) evals=1 samples=10 seconds=3600
## Plots would be generated based on csv files available and targets' versions ranges
@plots yaxis=:log10 kwarg2=option2 ...
end # @checkset
Such a perf-checker script would check allocations and benchmarks for each version of CompositionalNetworks. Each time, it would create an isolated environment by creating a new worker (and remove it after the check).
Interesting, I just thought about building something like that yesterday. Here are some features I am interested in, mostly for tracking progress for Makie:
Comparing not only registered versions but also arbitrary commits, PRs, branches, etc
Measuring the time it takes to do using Package
Maybe running each sample multiple times, to be more robust against natural performance deviations
Running the tests on CI and uploading results in some format that can be viewed or maybe interacted with online
Yes, I originally started with this feature in mind. I used registered versions in the first release, as it was the simplest to implement as a proof-of-concept.
Great idea. I bet there are some tricky parts to consider, though. Probably it could be worth to measure other things such as “download and precompile” from a clean environment, etc.
For allocation tracking and profiling, this could work in similar fashion than BenchmarkTools. Including the heuristic about the number of samples, evals, etc. Probably some tricky parts for that too haha.
Running the tests on CI would be good indeed. As for the last part, it sounds like a good job for Makie.jl!
I will update things here once I have something close to the syntax I proposed in my last post. From that point, it should be easier to include some of your ideas.
Personally, I would really try to avoid “macro soup”. It basically becomes a whole other language one has to learn, the error messages are usually not great etc. Some packages pull it off (like JuMP) but that is also a case where a DSL make sense. That’s my opinion at least.
Would also be great to run package benchmarks with a range of julia versions. Find potential performance regressions, and understand performance improvements: are they due to package update, or julia update?