CI : benchmarking dependencies best practices?

I’m working on several packages, with one or two packages with core features, and others depending on them, which aren’t really develop-centric, more so user applications.

When adding features to the core packages, there are integration checks to ensure that no breaking changes were accidentally introduced, meaning that we can create a PR, update our downstream packages smoothly, and ensure things are kept in sync as quickly as possible.

This works fine, but there is one area I haven’t been able to decide on in terms of how I should architect CI : benchmarking.

I’ve no issues setting up non-regression benchmarks for a single package, or ad hoc benchmarks for more specialised throwaway tests, but when dependencies get involved, things get a little finickier.

Example : one of our dependencies had a performance issue which was fixed in one of the core package. That’s a PR on the core package side, but a slight API change requires updating versions of dependencies, meaning it’s an issue that requires updating both packages, so you can’t just update the core package and simply check before/after on that end only. It is not trivial to set up a core package-exclusive equivalent showcasing the same performance issue, so both packages are necessary.

Seems to me non-regression benchmarks make more sense to be run in a centralised manner on the core packages where more dev activity happens. But I also want to pointlessly avoid running them when CI isn’t up-to-date in terms of integration, and I don’t want to have regressions bleed out into releases.

One way of keeping things clean would be to have a specific downstream benchmark test in a core package which pulls in dependencies in their ‘pre-release’ branch, after integration checks have been validated. That sounds like it fits my needs but it also feels clunky to have a main branch and a pre-release branch.

Or should I just benchmark on the dependency end ? How do other packages deal with this smoothly ?

I get the impression some Julia benchmarking tools seem to make plotting performance changes over time to detect regressions possible, but not that straightforward to combine with dependencies.

1 Like

I guess there are multiple interrelated aspects to this question :

  • Non-regression benchmarking dependencies isn’t discussed at much on the Github/CI side
  • Whether to centralize benchmarks of multiple packages or not, and how convenient that might be
  • Ensuring sync for benchmarking dependencies
1 Like

@willow maybe?