Why is `Test` so meta-programming heavy?

I am quite puzzled by the design choices of Julia’s Test - so I thought I should write my current understanding so that people can tell me what they think, or if I am misunderstanding something.

As I understand it, Julia’s Test relies heavily on meta-programming, due to the @testset etc macros. This introduces the following difficulties:

  • It’s difficult (/not even supported?) to open a debugger inside @testset (but thankfully Infiltrator.jl works)
  • I have occasionally witnessed stacktraces being very uniformative when an error appears inside nested @testsets. To be fair I only experienced this in errors of @generated functions, but I think that this illustrates the issues that the “meta-programming complexity” that @testset can introduce.

What is the rationale of having a testing framework that relies so much on meta-programming, like Test does?

In my mind there could be much simpler alternatives like e.g. following something similar to pytest (but without its complicated fixtures) so that unit tests are simply all the functions named test_* in the test folder of a package, organised in groups by their path.

For example:

test/test_abs:

function test_abs_on_int():
      @assert abs(-1) == 1

function test_abs_on_complex():
      @assert abs(-im) = 1.0

test/test_sum:

...

Common code across tests could then be shared by non test_* functions or Modules.

Here is one motivation, others can likely chime in with more

A macro allows detailed feedback when a test fails, consider this
image
this prints not only the values that were encountered and what was expected, similar to what a function could have done, but also the expression that produced the error. Only macros have access to the expression the user writes.

The @testset macro further controls how contained @test macros behave, i.e., it aggregates the results from @test macros within its body. Macros (or some other form of metaprogramming) are required for this.

Altogether, this makes it quite straightforward for the user to write tests that produce rich feedback.

10 Likes

Thanks for the reply!

A macro allows detailed feedback when a test fails, consider this…

I don’t think that I mind the “lowest level” macros (like @test) so much. Actually I did use a macro myself (@assert) in the original post .

But I am not seeing much benefit from this extra info that either @test or @assert prints, as compared to using bespoke functions test_equals, test_is_instance etc., perhaps because my check statements are typically very simple e.g. @test expected == computed. When using pytest I have a single, simple assertion in each test_* function, with an appropriate name e.g. test_abs_returns_positive for the unit test logic, which gives me enough information to know what the assertion that failed was aiming to do.

The @testset macro further controls how contained @test macros behave, i.e., it aggregates the results from @test macros within its body. Macros (or some other form of metaprogramming) are required for this.

Why do you say that metaprogramming is required for this aggregation? Given a vector of test_* functions one can do the same by redirecting stdout/stderr, no? For example pytest achieves nice aggregation without metaprogramming as far as I am aware.

I think one of the core ideas behind testing in Julia is that test/runtests.jl should be a simple julia script that runs all tests when it is executed (either via include() in an existing session, or running julia test/runtests.jl in the command line, or via Pkg.test() which sets up a “hardened” context to catch more errors).

My understanding is that this idea is reflected in the design of the Base.Tests standard library, where it is assumed that tests are plain scripts, and evaluating @testset stanzas should actually run the tests inside. This contrasts with the approach taken in pytest (as I understand it), where tests are defined as functions following a certain naming convention. This means that evaluating them only defines the function, but does not execute any code; in order to actually run the tests, you need another, specific entry point, that does a bit of introspection to find all functions defining tests and actually call them. This preliminary stage presumably allows pytest to nicely organize its outpout.

However, since Julia’s standard Base.Test approach functions as a plain script execution, it can’t really perform such a kind of a global preliminary analysis… except if at some stage it has access to a whole set of tests defintions thanks to the @testset macro. Maybe the macros could have been replaced by higher-order functions, but I doubt this would have produced a system where stack traces are more legible.

Are you aware of TestItemRunner and the new unit tests UI in VS Code? It might be a better fit for your workflow (don’t mind the @testitem high-level macro: AFAIU it merely acts as an annotation identifying test definitions, but is not really macro-expanded)

1 Like

How do you then test the output of something to stdout/stderr explicitly? And why reparse the output again to maybe omit some parts, when you can just not print them in the first place because a TestSet already records that information?

The issue I see with this is that now you have to write a new test_foo_returns_positiv for every single foo, which is hardly composable & reusable.

1 Like

I would imagine that this could work much like redirecting stdout via julia my_script.jl > out.txt would not interfere with tests that check the output of something to stdout/stderr explicitly. But I am afraid that I haven’t thought of this in detail.

And why reparse the output again to maybe omit some parts, when you can just not print them in the first place because a TestSet already records that information?

My thinking was that meta-programming increases complexity and thus should be used sparingly.

The issue I see with this is that now you have to write a new test_foo_returns_positiv for every single foo , which is hardly composable & reusable.

I think this goes into programming style. I am heavily influenced e.g. by the Clean Code book, that suggests a “Single Concept per Test” and “the single assert rule [as] a good guideline” (Chapter 9). I think you can often times reuse code otherwise, (including with meta-programming to generate the test_* functions if you really want to!).

I appreciate that Julia’s standard Base.Test approaches testing in a different way, more like a collection of scripts if I understand correctly. However, I have found its approach puzzling as I said in the original post, and not super practical: for example I can’t easily run just a part of the test suite of a third party package that I might be preparing a bugfix for. In contrast, with pytest you can easily do so by calling pytest path_to_subfolder or pytest particular_test_file.

it can’t really perform such a kind of a global preliminary analysis… except if at some stage it has access to a whole set of tests defintions thanks to the @testset macro. Maybe the macros could have been replaced by higher-order functions, but I doubt this would have produced a system where stack traces are more legible.

Couldn’t we do the preliminary analysis e.g. by including the test files and then inspecting symbols(Main)? Or you simply meant that this is not the approach taken by Main’s Test?

Are you aware of TestItemRunner and the new unit tests UI in VS Code? It might be a better fit for your workflow (don’t mind the @testitem high-level macro: AFAIU it merely acts as an annotation identifying test definitions, but is not really macro-expanded)

Yes, thanks, I am quite excited about it! As far as I understand it’s still work in progress, but I plan to use it as soon as it matures.

But I still don’t get it why even in TestItemRunner.jl they rely on macros heavily. Perhaps I consider meta-programming (which tbh I am not too familiar with) too much of a complexity that should be minimised, more so than the rest of the Julia community. ← Sorry didn’t read your mention about the high-level macro!

Here’s a chunk of unit tests from Base Julia:

@test allunique([])
@test allunique(Set())
@test allunique([1,2,3])
@test allunique([:a,:b,:c])
@test allunique(Set([1,2,3]))
@test !allunique([1,1,2])
@test !allunique([:a,:b,:c,:a])
@test allunique(4:7)
@test allunique(1:1)
@test allunique(4.0:0.3:7.0)
@test allunique(4:-1:5)
@test allunique(7:-1:1)
@test allunique(Date(2018, 8, 7):Day(1):Date(2018, 8, 11))
@test allunique(DateTime(2018, 8, 7):Hour(1):DateTime(2018, 8, 11))

Here’s what they would look like using the pytest style:

test_empty_array_allunique() = @assert allunique([])
test_empty_set_allunique() = @assert allunique(Set())
test_integer_array_allunique() = @assert allunique([1,2,3])
test_symbol_array_allunique() = @assert allunique([:a,:b,:c])
test_integer_set_allunique() = @assert allunique(Set([1,2,3]))
test_integer_array_not_unique() = @assert !allunique([1,1,2])
test_symbol_array_not_unique() = @assert !allunique([:a,:b,:c,:a])
test_range_allunique() = @assert allunique(4:7)
test_range_allunique2() = @assert allunique(1:1)
test_range_allunique3() = @assert allunique(4.0:0.3:7.0)
test_range_allunique4() = @assert allunique(4:-1:5)
test_range_allunique5() = @assert allunique(7:-1:1)
test_date_range_allunique() = @assert allunique(Date(2018, 8, 7):Day(1):Date(2018, 8, 11))
test_date_range_allunique2() = @assert allunique(DateTime(2018, 8, 7):Hour(1):DateTime(2018, 8, 11))

Which one do you think is easier to read? Note that I gave up on coming up with meaningful names for all the Range unit tests, so I resorted to numbering them, e.g. test_range_allunique5. I think this points to one of the main reasons to use meta-programming: legibility and reducing boilerplate.

8 Likes

Which one do you think is easier to read?

I think this goes into programming style. For what’s it’s worth, here’s how I would prefer to write them:

I would first reduce the names and group them by placing them in a file called test_allunique.jl. Then I would write them as

test_empty_array() = @assert allunique([])
test_empty_set() = @assert allunique(Set())
test_integer_array() = @assert allunique([1,2,3])
test_symbol_array() = @assert allunique([:a,:b,:c])
test_integer_set() = @assert allunique(Set([1,2,3]))
test_integer_array() = @assert !allunique([1,1,2])
test_symbol_array() = @assert !allunique([:a,:b,:c,:a])
test_range_ints() = @assert allunique(4:7)
test_range_single_int() = @assert allunique(1:1)
test_range_floats() = @assert allunique(4.0:0.3:7.0)
test_range_floats_reverse() = @assert allunique(4:-1:5)
test_range_int_reverse() = @assert allunique(7:-1:1)
test_date_range() = @assert allunique(Date(2018, 8, 7):Day(1):Date(2018, 8, 11))
test_datetime_range() = @assert allunique(DateTime(2018, 8, 7):Hour(1):DateTime(2018, 8, 11))

This way, I am grouping the tests in a file and communicating intent of each tests. I would then be able to kinda understand what’s wrong in the package-under-test just by looking at the names of the failed tests.

In general regarding programming style I am basically echoing the Clean Code’s book approach. It’s getting a bit off-topic, I think. I am not trying to argue that a single style is the best one, but rather trying to understand why meta-programming is so heavily used by Base.test, given the difficulties that it presents (as listed in my original post).

I used to run into this problem too, but then I opened Add the ability to use function calls in `@testset` directly. by Seelengrab · Pull Request #42518 · JuliaLang/julia · GitHub (included in the upcoming 1.9) and I’m very much looking forward to using that in everything. This cleanly allows you to encapsulate all your @test setup in a function, allows to place @test in a function, allowing the following:

function test_specific_thing_with_arg(f, args, res)
    # setup...
    @test foobar()
end

function testAll()
    @testset for x in (a, b, c)
        @testset test_specific_thing_with_arg(f, args, x)
    end
end

!isinteractive() && testAll()

which allows me to just run test_specific_thing_with_arg in isolation. The example is of course bunk, but should show how a single change can greatly change how a tool can be used.

Meta programming should usually be used to reduce complexity, by preventing mistakes when writing boiler plate code, by removing the boiler plate entirely. Just consider having to write the expanded version of a @testset every time you want to group a test:

julia> using Test

julia> f() = true
f (generic function with 1 method)

julia> @macroexpand @testset f()
quote
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1488 =#
    Test._check_testset(if Test.get_testset_depth() == 0
            Test.DefaultTestSet
        else
            Test.typeof(Test.get_testset())
        end, $(QuoteNode(:(get_testset_depth() == 0))))
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1489 =#
    local var"#5#ret"
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1490 =#
    local var"#1#ts" = (if Test.get_testset_depth() == 0
                Test.DefaultTestSet
            else
                Test.typeof(Test.get_testset())
            end)("f"; Test.Dict{Test.Symbol, Test.Any}()...)
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1491 =#
    Test.push_testset(var"#1#ts")
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1495 =#
    local var"#2#RNG" = Test.default_rng()
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1496 =#
    local var"#3#oldrng" = Test.copy(var"#2#RNG")
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1497 =#
    local var"#4#oldseed" = (Test.Random).GLOBAL_SEED
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1498 =#
    try
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1500 =#
        (Test.Random).seed!((Test.Random).GLOBAL_SEED)
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1501 =#
        let
            #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1502 =#
            f()
        end
    catch var"#7#err"
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1505 =#
        var"#7#err" isa Test.InterruptException && Test.rethrow()
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1508 =#
        Test.trigger_test_failure_break(var"#7#err")
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1509 =#
        if var"#7#err" isa Test.FailFastError
            #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1510 =#
            if Test.get_testset_depth() > 1
                Test.rethrow()
            else
                Test.failfast_print()
            end
        else
            #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1512 =#
            Test.record(var"#1#ts", Test.Error(:nontest_error, Test.Expr(:tuple), var"#7#err", (Test.Base).current_exceptions(), $(QuoteNode(:(#= REPL[3]:1 =#)))))
        end
    finally
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1515 =#
        Test.copy!(var"#2#RNG", var"#3#oldrng")
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1516 =#
        (Test.Random).set_global_seed!(var"#4#oldseed")
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1517 =#
        Test.pop_testset()
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1518 =#
        var"#5#ret" = Test.finish(var"#1#ts")
    end
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1520 =#
    var"#5#ret"
end

That’s A LOT of setup code to provide a reproducible environment. Sure, it could have been a function instead, but then you need lots and lots more global variables to synchronize stuff, or your testing framework becomes much less flexible (by actually shoving all that complexity into each and every test function). Back when this was added, we didn’t have multithreading after all. Instead, @testset is used to record everything in a specific object, allowing later introspection.

Granted, overusing metaprogramming can increase mental load by hiding complexity, but I don’t think the uses by Test are particularly egregious.

An example of that would be appreciated, since that sounds more like an issue with errors in generated functions than @testset per se.

Out of interest, have you looked at how Test works internally and what it tries to achieve? It’s not using metaprogramming just for the heck of it, but giving a uniform interface for allowing outside users to very easily hook into the existing @testset machinery with their own testsets, for recording & displaying purposes. The fact that A LOT of packages then just use @testset on the top level, disallowing easy “just run this test please”, is a different matter; I agree with you that this is not a good style, but that’s an orthogonal issue to whether or not Test employs metaprogramming to reduce boilerplate.

2 Likes

a reproducible environment

Off-topic I thought that @safetestset was needed for reproducibility?

Sure, it could have been a function instead, but then you need lots and lots more global variables to synchronize stuff, or your testing framework becomes much less flexible (by actually shoving all that complexity into each and every test function)

I am afraid I am not following why these are necessary. Perhaps I am missing something?

Out of interest, have you looked at how Test works internally and what it tries to achieve? It’s not using metaprogramming just for the heck of it, but giving a uniform interface for allowing outside users to very easily hook into the existing @testset machinery with their own testsets, for recording & displaying purposes.

I am afraid that I don’t know much about the internals of Test. What I still don’t understand is why is meta-programming is necessary. I have seen quite large test suites work without it (e.g. via pytest as I mentioned).

So while I am still puzzled about what are the advantages of a metaprogramming-heavy test framework are, I know some of it’s current limitations, most notably (for me): loosing the ability to use a debugger.

An example of that would be appreciated, since that sounds more like an issue with errors in generated functions than @testset per se.

Sorry, I gave it a go, and could not generate a minimal example for this :frowning: What I have observed was getting a single line stacktrace (that was of very little use) when running within a @testset as compared to a full stacktrace when without, with regards to a bug of a generated function of some library. I agree, perhaps it was not an issue of @testset itself - just not a great user experience.

@testset is not in a separate module, so some things you define in a test set, like functions, leak out of a @testset.

2 Likes

Metaprogramming or something very like it is required for ergonomic test failure reporting. For example, Pytest relies on meta-programming to implement one of it’s biggest ergonomic features: “Assertion Re-Writing”. See this 4 part article talking about why it is helpful and how it is implemented: Assertion rewriting in Pytest part 4: The implementation – Python Insight

def test_equality():
    a = 4
    b = 5
    assert a==b

In pytest will output an error showing the line of the assert and the values (4 and 5) of a and `b, which is much more information than you would normally get running this function in normal python. That is enabled by Assertion Re-writing, which is form of metaprogramming. It relies on an import hook that inspects the loaded code, re-writes it, and then executes not the code you wrote, but the re-written code.

The @test macro in Julia does the same thing as assertion re-writing, as well as at least some more that I couldn’t list off the top of my head. Personally, I could write a simple version of the @test macro that does assertion re-writing easily in Julia while I’d have no idea how to start in Python without that article I linked, and I spend 20x more time writing Python than Julia. So I see that as a big plus for Julia’s approach.

I do agree that Julia’s default test experience would benefit from easily being able to run test subsets, though maybe that’s been added since I last used it. The @testitem approach looks great and I’ll try that next time I write a Julia package.

6 Likes

It’s available in ReTest.jl (which also uses quite a lot of metaprogramming).

5 Likes

What I do in my packages is to have every file do using Test, etc that way I can run individual sub-files with include (or ctrl-shift-enter in VSCode).

2 Likes

Thanks for that link. My understanding was that pytest was doing some fancy hacking on assert, but I didn’t know exactly how it worked and I couldn’t find a link. So, the bottom line is that both pytest and Julia use metaprogramming for unit tests. :slight_smile:

1 Like

This is potentially useful for other projects:

JuMP uses a custom function test_xxx() approach:

Tests are test_xxx() functions in a test_xxx.jl file that contain a TestXXX module:

test/runtests.jl is a light-weight runner JuMP.jl/runtests.jl at master · jump-dev/JuMP.jl · GitHub

that uses this package: JuMP.jl/Kokako.jl at master · jump-dev/JuMP.jl · GitHub

3 Likes

Thanks a lot, I did not know that pytest was doing all these stuff under the hood for the assertions.

Besides the @test macro, that you and others kindly explained its usefulness to me, would you also say that you find @testset useful or to some extent necessary?

As far as I understand pytest gets away without it by mandating that the tests are precisely all the test_* functions in test_*.py files, thereby ending with something simpler for the user and arguably more powerful (e.g. ability to run part of the test suite as discussed)?

One huge caveat with using the @testset in function approach is that you run headlong into lowering: track location of macros better for stackwalk by vtjnash · Pull Request #44995 · JuliaLang/julia · GitHub. Not having stacktraces on test failures is not good.

I feel that making an arbitrary convention of “all test functions should start with test_” is even more magical than the metaprogramming Test does. At least when I see the @ I know magic is happening. That and pytest also makes liberal use of @ decorators for parameterized tests, fixtures, etc. Those are basically equivalent to what Julia testing libraries do with metaprogramming but at runtime.

As others have mentioned, the issue with Test not being powerful enough to support features like partial runs is not because of metaprogramming, but because of its particular implementation (just spilling a bunch of code into the current scope without isolating it in a function or module). Newer test libraries fix this, but the stdlib has been slow to adapt.

5 Likes