Testing type stability across Julia and JET versions

In my package DifferentiationInterface.jl, the test suite includes type-stability checks for some differentiation operators. I perform these checks with JET.jl and the @test_opt macro, which actually verifies a stronger property called type-groundedness (according to the terminology in this paper). In other words, it’s not just about the output of the function being @inferred, but everything else that happens inside too.

My problem is that these tests are brittle: Julia doesn’t guarantee its inference behavior as part of the API, and JET is evolving fast. As a result, the type-stability tests often break without me changing anything, with errors like

failed to optimize due to recursion

In addition, I often observe that the issue lies in some printing or string function that I don’t really care about (and didn’t even know would be called).

Is there a recommended way to make such tests more robust? Or less dependent on internals? Ping @aviatesk

4 Likes

Regarding extraneous issues reported:
One approach is to specify a target module, e.g.

@report_opt target_modules=(@__MODULE__,) compute(30)

which would only report issues in that module.
Also,

There is also function_filter, which can ignore specific function calls.

In general, the Optimization Analysis · JET.jl page suggests some useful ideas.

2 Likes

My hesitation with these tools is that I have two competing goals:

  • Ensuring that my package itself doesn’t introduce type instabilities
  • Testing that the code as a whole is type-stable, including function calls from other packages

Ignoring modules or functions is good for the first goal but bad for the second. Perhaps I should separate them?

1 Like

Regarding this report in particular, it is expected to be fixed in the latest v1.11 (1.11: improve type stability of `_unsafe_take!(::IOBuffer)` by aviatesk · Pull Request #54942 · JuliaLang/julia · GitHub).
While I completely agree that such breakages are troublesome, there is also an aspect that such breakages are inevitable due to JET’s design. As @jishnub mentioned, using function_filter or target_modules is one way, but these do the opposite of @gdalle’s second objective. So this is another instance of common tradeoffs of analysis accuracy and false positives.
In cases like this, I have usually fixed the issue on the Base side.
That said, it’s true that the performance of logging-related code is rarely an actual problem, so it might be reasonable to make JET automatically ignore such code. JET is not a sound analyzer to begin with, and although this would mean adding a new analysis option, having too many options has already been an ongoing issue.

1 Like

Thanks for taking the time to answer!
I think the best solution in my case would be to parametrize my function filter or ignored modules, depending on which of my goals I’m pursuing at the moment. I was worried that the macro arguments wouldn’t accept more complicated constructs than function names, but this seems to work:

julia> using JET

julia> function f(x)
       a = []
       push!(a, x)
       return sum(a)
       end
f (generic function with 1 method)

julia> @test_opt function_filter=Returns(false) f(1.0)
Test Passed

What do you mean by that? If I’m not mistaken, JET is the only way to test type-groundedness in a programmatic way, without manual inspecting of the Cthulhu.jl results?

1 Like

Yeah, function_filter takes on function objects rather than their names, so you can implement a filter that does a nuanced work.

Ah, “sound” is a term used in static analysis and statistics, meaning something like “reliable”.
What this actually means is that there are no false negatives; a sound analysis is guaranteed to detect all possible errors or issues, while it tends to often produce unnecessary warnings (false positives). In contrast, JET is not sound and prioritizes reducing false positives to make the tool more user-friendly rather than strictly eliminating false negatives. For example, ignoring dynamic dispatch in logging code or printing would typically result in false negatives for an optimization checker (report_opt), but from the user’s perspective in this case, it should be considered a false positive.

2 Likes

Ref