Why is `Test` so meta-programming heavy?

I used to run into this problem too, but then I opened Add the ability to use function calls in `@testset` directly. by Seelengrab · Pull Request #42518 · JuliaLang/julia · GitHub (included in the upcoming 1.9) and I’m very much looking forward to using that in everything. This cleanly allows you to encapsulate all your @test setup in a function, allows to place @test in a function, allowing the following:

function test_specific_thing_with_arg(f, args, res)
    # setup...
    @test foobar()
end

function testAll()
    @testset for x in (a, b, c)
        @testset test_specific_thing_with_arg(f, args, x)
    end
end

!isinteractive() && testAll()

which allows me to just run test_specific_thing_with_arg in isolation. The example is of course bunk, but should show how a single change can greatly change how a tool can be used.

Meta programming should usually be used to reduce complexity, by preventing mistakes when writing boiler plate code, by removing the boiler plate entirely. Just consider having to write the expanded version of a @testset every time you want to group a test:

julia> using Test

julia> f() = true
f (generic function with 1 method)

julia> @macroexpand @testset f()
quote
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1488 =#
    Test._check_testset(if Test.get_testset_depth() == 0
            Test.DefaultTestSet
        else
            Test.typeof(Test.get_testset())
        end, $(QuoteNode(:(get_testset_depth() == 0))))
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1489 =#
    local var"#5#ret"
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1490 =#
    local var"#1#ts" = (if Test.get_testset_depth() == 0
                Test.DefaultTestSet
            else
                Test.typeof(Test.get_testset())
            end)("f"; Test.Dict{Test.Symbol, Test.Any}()...)
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1491 =#
    Test.push_testset(var"#1#ts")
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1495 =#
    local var"#2#RNG" = Test.default_rng()
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1496 =#
    local var"#3#oldrng" = Test.copy(var"#2#RNG")
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1497 =#
    local var"#4#oldseed" = (Test.Random).GLOBAL_SEED
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1498 =#
    try
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1500 =#
        (Test.Random).seed!((Test.Random).GLOBAL_SEED)
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1501 =#
        let
            #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1502 =#
            f()
        end
    catch var"#7#err"
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1505 =#
        var"#7#err" isa Test.InterruptException && Test.rethrow()
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1508 =#
        Test.trigger_test_failure_break(var"#7#err")
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1509 =#
        if var"#7#err" isa Test.FailFastError
            #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1510 =#
            if Test.get_testset_depth() > 1
                Test.rethrow()
            else
                Test.failfast_print()
            end
        else
            #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1512 =#
            Test.record(var"#1#ts", Test.Error(:nontest_error, Test.Expr(:tuple), var"#7#err", (Test.Base).current_exceptions(), $(QuoteNode(:(#= REPL[3]:1 =#)))))
        end
    finally
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1515 =#
        Test.copy!(var"#2#RNG", var"#3#oldrng")
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1516 =#
        (Test.Random).set_global_seed!(var"#4#oldseed")
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1517 =#
        Test.pop_testset()
        #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1518 =#
        var"#5#ret" = Test.finish(var"#1#ts")
    end
    #= /home/sukera/julia/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1520 =#
    var"#5#ret"
end

That’s A LOT of setup code to provide a reproducible environment. Sure, it could have been a function instead, but then you need lots and lots more global variables to synchronize stuff, or your testing framework becomes much less flexible (by actually shoving all that complexity into each and every test function). Back when this was added, we didn’t have multithreading after all. Instead, @testset is used to record everything in a specific object, allowing later introspection.

Granted, overusing metaprogramming can increase mental load by hiding complexity, but I don’t think the uses by Test are particularly egregious.

An example of that would be appreciated, since that sounds more like an issue with errors in generated functions than @testset per se.

Out of interest, have you looked at how Test works internally and what it tries to achieve? It’s not using metaprogramming just for the heck of it, but giving a uniform interface for allowing outside users to very easily hook into the existing @testset machinery with their own testsets, for recording & displaying purposes. The fact that A LOT of packages then just use @testset on the top level, disallowing easy “just run this test please”, is a different matter; I agree with you that this is not a good style, but that’s an orthogonal issue to whether or not Test employs metaprogramming to reduce boilerplate.

2 Likes

a reproducible environment

Off-topic I thought that @safetestset was needed for reproducibility?

Sure, it could have been a function instead, but then you need lots and lots more global variables to synchronize stuff, or your testing framework becomes much less flexible (by actually shoving all that complexity into each and every test function)

I am afraid I am not following why these are necessary. Perhaps I am missing something?

Out of interest, have you looked at how Test works internally and what it tries to achieve? It’s not using metaprogramming just for the heck of it, but giving a uniform interface for allowing outside users to very easily hook into the existing @testset machinery with their own testsets, for recording & displaying purposes.

I am afraid that I don’t know much about the internals of Test. What I still don’t understand is why is meta-programming is necessary. I have seen quite large test suites work without it (e.g. via pytest as I mentioned).

So while I am still puzzled about what are the advantages of a metaprogramming-heavy test framework are, I know some of it’s current limitations, most notably (for me): loosing the ability to use a debugger.

An example of that would be appreciated, since that sounds more like an issue with errors in generated functions than @testset per se.

Sorry, I gave it a go, and could not generate a minimal example for this :frowning: What I have observed was getting a single line stacktrace (that was of very little use) when running within a @testset as compared to a full stacktrace when without, with regards to a bug of a generated function of some library. I agree, perhaps it was not an issue of @testset itself - just not a great user experience.

@testset is not in a separate module, so some things you define in a test set, like functions, leak out of a @testset.

2 Likes

Metaprogramming or something very like it is required for ergonomic test failure reporting. For example, Pytest relies on meta-programming to implement one of it’s biggest ergonomic features: “Assertion Re-Writing”. See this 4 part article talking about why it is helpful and how it is implemented: Assertion rewriting in Pytest part 4: The implementation – Python Insight

def test_equality():
    a = 4
    b = 5
    assert a==b

In pytest will output an error showing the line of the assert and the values (4 and 5) of a and `b, which is much more information than you would normally get running this function in normal python. That is enabled by Assertion Re-writing, which is form of metaprogramming. It relies on an import hook that inspects the loaded code, re-writes it, and then executes not the code you wrote, but the re-written code.

The @test macro in Julia does the same thing as assertion re-writing, as well as at least some more that I couldn’t list off the top of my head. Personally, I could write a simple version of the @test macro that does assertion re-writing easily in Julia while I’d have no idea how to start in Python without that article I linked, and I spend 20x more time writing Python than Julia. So I see that as a big plus for Julia’s approach.

I do agree that Julia’s default test experience would benefit from easily being able to run test subsets, though maybe that’s been added since I last used it. The @testitem approach looks great and I’ll try that next time I write a Julia package.

6 Likes

It’s available in ReTest.jl (which also uses quite a lot of metaprogramming).

5 Likes

What I do in my packages is to have every file do using Test, etc that way I can run individual sub-files with include (or ctrl-shift-enter in VSCode).

2 Likes

Thanks for that link. My understanding was that pytest was doing some fancy hacking on assert, but I didn’t know exactly how it worked and I couldn’t find a link. So, the bottom line is that both pytest and Julia use metaprogramming for unit tests. :slight_smile:

1 Like

This is potentially useful for other projects:

JuMP uses a custom function test_xxx() approach:

Tests are test_xxx() functions in a test_xxx.jl file that contain a TestXXX module:

test/runtests.jl is a light-weight runner JuMP.jl/runtests.jl at master · jump-dev/JuMP.jl · GitHub

that uses this package: JuMP.jl/Kokako.jl at master · jump-dev/JuMP.jl · GitHub

3 Likes

Thanks a lot, I did not know that pytest was doing all these stuff under the hood for the assertions.

Besides the @test macro, that you and others kindly explained its usefulness to me, would you also say that you find @testset useful or to some extent necessary?

As far as I understand pytest gets away without it by mandating that the tests are precisely all the test_* functions in test_*.py files, thereby ending with something simpler for the user and arguably more powerful (e.g. ability to run part of the test suite as discussed)?

One huge caveat with using the @testset in function approach is that you run headlong into lowering: track location of macros better for stackwalk by vtjnash · Pull Request #44995 · JuliaLang/julia · GitHub. Not having stacktraces on test failures is not good.

I feel that making an arbitrary convention of “all test functions should start with test_” is even more magical than the metaprogramming Test does. At least when I see the @ I know magic is happening. That and pytest also makes liberal use of @ decorators for parameterized tests, fixtures, etc. Those are basically equivalent to what Julia testing libraries do with metaprogramming but at runtime.

As others have mentioned, the issue with Test not being powerful enough to support features like partial runs is not because of metaprogramming, but because of its particular implementation (just spilling a bunch of code into the current scope without isolating it in a function or module). Newer test libraries fix this, but the stdlib has been slow to adapt.

5 Likes

Here is a C++ testing library that doesn’t require macros.
Julia macros are far better than C++ macros, so we tend to try less hard to avoid them. But if we wanted to, we could perhaps draw inspiration from there.

They use overloading + operator precedence.
E.g.

struct Test end
const test = Test()

struct TestExpr # should we specialize?
     t::Any
end

Base.:(%)(::Test, x) = TestExpr(x)
function Base.:(==)(x::TestExpr, y)
    if x.t != y
        # we'd want to define a TestException
        error("$(x.t) != $y")
    end
end

I get

julia> a = 1; b = 2;

julia> test % a == b
ERROR: 1 != 2
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] ==(x::TestExpr, y::Int64)
   @ Main ./REPL[5]:4
 [3] top-level scope
   @ REPL[8]:1

Multiple dispatch (or in the case of C++, function overloading) can take you far – but as you note, it only takes us as far as the values.

Without macros, we do not have the expressions.

4 Likes

I agree that pytest’s @fixture and many other decorators are too complicated. So perhaps when the user is writing parametrised tests, using Julia’s meta-programming might be quite advantageous, as compared to pytest’s fixtures, but I believe that the use of metaprogramming would be best to be left at the discretion of the user.

However, I disagree in that I do find a rule like “unit test is every exported test_*() function defined in any test_*.jl file in the packages test folder” as very simple and powerful.

As others have mentioned, the issue with Test not being powerful enough to support features like partial runs is not because of metaprogramming

Agreed, but my original post was not about it being powerful - it was about the obscurity the metaprogramming introduces (sorry, I did get to not being powerful later on :laughing:). I mentioned the lack of debugging capabilities and difficult stack traces as my two main points - don’t you think that these relate to metaprogramming (in particular for @testset - not so much for @test which is much “lower level” and other people in this thread convinced me of its use)?

Perhaps I am too averse to metaprogramming? I try to avoid it, unless there is a particularly good reason to use, and even then I try to contain it, not put at the top-most of my whole code. This is what was suggested in this JuliaCon Keynote Talk, where Steven Johnson (an expert in code generation, I believe) spends almost twenty minutes talking about how most often using meta-programming is a mistake.

That’s true about 90% of the time, but in the remaining 10% of cases it is extremely useful. :wink:

10 Likes

I think one challenge is that it’s not fully clear what you mean by meta-programming since the Python libraries you used as a contrast with Julia are built on top of techniques (e.g. introspection) that I would call meta-programming: https://2019.pycon.de/program/pyconde-xtd7te-abridged-metaprogramming-classics-this-episode-pytest-oliver-bestwalter/

1 Like

I think there’s two cases to use metaprogramming:

  1. When you need to write something that acts on the code naming itself, i.e. you need to know what the names of the variables the user uses in order to give better printing/feedback. Ex: @variable x.
  2. Syntactic sugar for things the user knows how to write but doesn’t want to. Example: @.. or @views.

Many times you have both, for example @named x = ODESystem(sys) in ModelingToolkit performs both functions, or @model in Turing.jl names the internal variables using the DSL of Turing. Anything more than that though, I would consider using a function.

Test uses it for function (1).

1 Like

Your faith in my the user’s ability to correctly write out the result of syntactic sugar macros is probably misplaced :wink:

2 Likes

There’s a significant difference between using already-defined macros that Julia provides (like @testset, @test, and also @simd, @view, etc) and the trap of defining your own macros or using eval/@eval.

You definitely should be averse to the latter; that’s where the dragons lie. But using macros in the stdlibs (except @eval) shouldn’t be so scary.

8 Likes

Don’t see much difference between macros in base, libraries or user code, i.e., in all cases they should make sense and be there for a reason (in the end, one of the nice things about Julia is that user-defined code is just like system code, e.g., allowing for efficient and composable custom data structures),
@testset constructs a protected scope around the tests executed within. Such scopes have traditionally been handled via macros, e.g., in Common Lisp. Yet, modern language often support another syntax in the form of extensible resource managers, e.g., the with statement in Python. Guess that the do notation could have been used in Julia instead of a macro:

testset("name") do
    @test 1 == 2
end

Metaprogramming has become rather wide-spread, e.g., in Python or Ruby. Yet, there its mostly done at runtime via hooking into the meta-object protocol instead of syntactic transformations, i.e., macros.

You are right - I don’t have a clear understanding on what constitutes meta-programming, and this was also reflected in the original post.

In that sense I kinda got the answer to my question.

Still, I am actually surprised that people have not commented at all in the lack of debugging (mentioned in my original post as one of the two motivating difficulties) while in @testsets and whether this is due to meta-programming (as per their definition).

I might be wrong, but I think that would not be easy to pull off in the existing base test framework. I think it would probably be much easier to add debugging support to the test item framework, in fact pretty much all the pieces need to pull that off inside VS Code exist already. I “just” need to hook it all up :slight_smile: No promise on when that is going to happen, it is on my roadmap, but it also is probably a pretty significant lift, so it might be a while. But at least the design should really lend itself very well to implementing this.

13 Likes