Consistency testing when unit-testing is infeasible

I agree that pre-generation is the better approach here. I was just curious, if it is possible at all to use the packages written in other languages as dependencies (for testing, for example).

You could use Conda.jl and install Python etcetera that way.

Two options I can think of to deal with this.

  1. Modify the CI so that its container definition includes the Python installation. The unit testing and coverage are done within a specific container. You’ll see in the CI that you can add pre-installation steps for packages, and that might be enough that you don’t need a completely new container definition. Maybe someone can think of some good examples of projects that have customized package installation in the CI, and you could follow one of those.

  2. Only run the Python tests when you are testing locally. Use the skip() function, combined with arguments to runtests.jl, to turn on the Python tests locally. This way the CI uses all defaults, and you have less fussing to do. You’d still keep everything in the test directory, just turning it on or off with arguments to test.

I did quite a lot of QMC simulations of itinerant electron systems. There I benchmarked results for small systems against ED and tested “trivial” properties of large non-interacting systems. For example, the non-interacting Green’s function of the electrons is trivial to right down analytically but is a non-trivial test of all the linear algebra logic in my code.

As mentioned by others, another important test was to compare results to those obtained with other codes. For example I asked a colleague, who was working on something similar, to run his code on a certain set of parameters and then I checked whether I get the same results within error bounds.

2 Likes

I have experience of this from medical image enhancement, where goodness of the output images is entirely subjective and is approved by clinical experts. Thus changes in the code base need to be verified that they don’t affect the result of a largish testset and if they do and it can’t be avoided, either verify that the changes are small enough not to have visual impact or re-approve the results.

The way that this testing is set up is that we have a test script that runs through the tests and either saves the output images to disk or compares them to previously saved images (somewhat simplified, but this is the essence of it). To produce the reference images, which are saved to disk, we activate a reference environment which basically has been created by a script doing

using Pkg
algorithm_package_dir = dirname(@__DIR__)
cd(algorithm_package_dir)
Pkg.activate(joinpath(algorithm_package_dir, "test", "reference_environment"))
Pkg.add(PackageSpec(name="AlgorithmPackage", path=".", rev=ARGS[1]))

where ARGS[1] is the commit hash of your reference revision. I.e. this Pkg.adds the code we’re testing at a well defined commit and from this environment we run the test script in save mode. Notice that this doesn’t require any explicit mucking around with git, Pkg handles all of that.

After that we start a new Julia process using the normal package environment, running the test script in comparison mode. Whenever we are forced to accept changes in the image results, we update the reference environment with a new commit hash of the algorithm code.

It should be noted that there have been Pkg bugs related to this, https://github.com/JuliaLang/Pkg.jl/issues/2247, so some care should be taken with mixed platforms and it may be necessary to workaround some current directory issues with recent Julia versions. These bugs have been fixed and should be solved in Julia 1.6.

4 Likes

Hi I joined discourse to answer you :sunglasses:

I work alone as follows:

  • test driven development and its three rules

  • rigorously sensible variable names

  • functions with none, one or at most two arguments

  • lots of logging as I code which I then comment out

  • if some thing breaks many times I might put a few defensive assertions in to check every time in real time that all is as it should be

  • Uncle Bob mentions some reasons not to unit test in one of his many YouTube talks and I’ve added one or two: things that are arbitrary, things that are domain/business logic, and things on the edge of the control flow diagram like GUIs, things that are so simple the test would be more complex and that are basically like one liner trig functions

  • complex functions have “form” tests like the type of output being right; and “content” tests like below

  • for important functions that have science or business logic ways to go wrong I set up “content” tests with lots of reference scenarios with known inputs and known outputs and iterate through them test that each gives a correct output

  • interestingly these days if I have a function that works and I factor out an implementation detail within it just for clarity I quite often don’t bother with a test for that sub-function

I occasionally dust off my debugger and lint my code but often there is nothing to find - seriously - not saying to sound like a smart ass but I’d rather not use a debugger

3 Likes