Tests fail on CI, pass locally

The tests for the following pull requests pass locally, but fail on Github when using Julia 1.11 : Kcu steering · ufechner7/KiteModels.jl@5cd060d · GitHub

Any idea how to debug that?

The problem is that when running the CI I get a high number of allocations, which I do not get locally. What could be the reason?

test/bench4.jl:127
  Expression: t.memory <= 208
   Evaluated: 1232 <= 208
  1. did you see the message
    │ To reproduce this CI run locally run the following from the same repository state on julia version 1.11.3:
    │
    │ `import Pkg; Pkg.test(;coverage=true, julia_args=["--check-bounds=yes", "--compiled-modules=yes", "--depwarn=yes"], force_latest_compatible_version=false, allow_reresolve=true)`
    
    ?
  2. if that’s not sufficient, you can always log into the GitHub hosted runners with GitHub - mxschmitt/action-tmate: Debug your GitHub Actions via SSH by using tmate to get access to the runner system itself.

I was able to reproduce the problem (too many allocations) using this command line locally.

But how can I fix it? Can I change the parameters of Pkg.test() in the CI? If yes, how?

And why is this problem new and did not happen before? Was there a change in the CI parameters lately, or does Julia 11.3 behave differently than Julia 11.2 ?

UPDATE:
The parameter coverage=true causes extra allocations that make my test fail. This is not the case with Julia 1.10, for example.

I fixed the failures using code like this:

if VERSION.minor == 11
    # the higher allocation happens only when testing with "coverage=true"
    @test t.memory <= 1232
else
    @test t.memory <= 128
end

It does not feel like a good solution, though.

  • is there a way to test inside of a test the value of the parameter coverage?
  • why does this problem appear with Julia 1.11 and not with Julia 1.10?
Base.JLOptions().code_coverage

but note that’s an internal/private/udocumented function, it may break at any point, don’t say I didn’t warn you.

Code coverage has side effects which can change from version to version and there’s no guarantee about memory usage. Feels like your test is very flaky.

1 Like

Well, if code coverage has side effects it should be well documented if it is true or false.

How shall I ensure low/ no memory allocations without testing them?

Code coverage generates files which wouldn’t be there otherwise, that’s pretty much a side effect by definition.

1 Like

I created a feature request: Provide a public function like Base.JLOptions().code_coverage · Issue #57168 · JuliaLang/julia · GitHub

I’m reviving this thread because I just started to experience this issue.

It stared to happen, apparently, with 1.11.3 only.

The issue is that tests that probe for allocations started to fail because of the code-coverage. I’m not sure, but I think that CI testing and reporting should be run separately from the tests that are used to compute coverage?

I don’t feel that checking within the test if coverage is being used or not is the issue, but more that coverage and actual unit testing should not be part of the same CI run.

It is my impression that many people use allocation tests, and those will start failing now?

Thoughts?

If you’re saying that you didn’t observe this situation in v1.11.2 then it shouldn’t be hard to run git bisect to see what caused the difference.

1 Like
bdf8219ee80557ea6035b421b00d91b1174234f2 is the first bad commit
commit bdf8219ee80557ea6035b421b00d91b1174234f2
Author: Jameson Nash <vtjnash@gmail.com>
Date:   Mon Dec 9 16:41:30 2024 -0500

    precompile: don't waste memory on useless inferred code (#56749)
    
    We never have a reason to reference this data again since we already
    have native code generated for it, so it is simply wasting memory and
    download space.
    
    $ du -sh {old,new}/usr/share/julia/compiled
    256M old
    227M new
    
    (cherry picked from commit dfe6a13e5038c8cbe0f1720d190629225ec1a19b)

 base/compiler/effects.jl |  1 +
 src/codegen.cpp          |  8 ++++----
 src/staticdata.c         | 30 ++++++++++++++++++++++++++++--
 3 files changed, 33 insertions(+), 6 deletions(-)

This is the test:

import Pkg
Pkg.add(url="https://github.com/lmiq/MeuNovoPacote.jl")
Pkg.test("MeuNovoPacote"; coverage=true)

Issue reported here: allocation tests fail with coverage=true after https://github.com/JuliaLang/julia/commit/dfe6a13e5038c8cbe0f1720d190629225ec1a19b · Issue #57220 · JuliaLang/julia · GitHub

Copying my answer from the Github issue:

Is there a workaround? Some way to get the coverage report but in parallel run CI tests with the same compiler options that we use for production? (and thus, the ones which we test locally)

FixedSizeArrays works around this issue by splitting CI into coverage-enabled and coverage-disabled jobs. CI informs the test suite that “the build is a production build” when coverage is disabled, thus enabling the no-allocation tests:

The test suite needs to parse the environment variable:

The CI is configured like so:

Perhaps it’d be good if a Github Action was available to make this easier.

1 Like

Does it make sense to run tests with coverage in all platforms and/or Julia versions?

Running on a single platform or versions will produce worse coverage reports in any sense?

As shown in the above YAML file, FixedSizeArrays.jl has the coverage-enabled jobs run only on x64 Ubuntu. This is OK because there’s no platform-specific source code. That said, it makes sense to run on multiple Julia versions to increase the coverage a bit, as the reports get merged, and Julia’s imperfect coverage implementation varies from release to release.

1 Like

FWIW, I opted (for now) to skip the allocation tests if the build run option is set to false, and run them otherwise.

Did that just be defining the following function for my allocation tests:

    function test_allocs(allocs, max_allocs)
        if haskey(ENV, "BUILD_IS_PRODUCTION_BUILD") && ENV["BUILD_IS_PRODUCTION_BUILD"] == "false"
            true
        else
            allocs <= max_allocs
        end
    end

and setting up the CI run accordingly here following your example.

An update, for anyone interested. I use the TestItems.jl / TestItemRunner.jl framework for testing. Thus, to minimally alter my allocation tests, I have defined:

@testmodule AllocTest begin
    # This module defines the Allocs struct and the comparison operators
    # to conditionally compare the number of allocations based on the
    # BUILD_IS_PRODUCTION_BUILD environment variable.
    export Allocs
    @kwdef struct Allocs
        prodbuild::Bool = haskey(ENV, "BUILD_IS_PRODUCTION_BUILD") && ENV["BUILD_IS_PRODUCTION_BUILD"] == "true"
        allocs::Int
    end
    Allocs(allocs::Int) = Allocs(; allocs)
    import Base: ==, >, <
    ==(a::Int, b::Allocs) = b.prodbuild ? a == b.allocs : true
    <(a::Int, b::Allocs) = b.prodbuild ? a < b.allocs : true
    ==(a::Allocs, b::Int) = a.prodbuild ? a.allocs == b : true
    <(a::Allocs, b::Int) = a.prodbuild ? a.allocs < b : true
end

and then allocation tests can be modified by simply changing direct previous comparison with the comparison with the Allocs object, for example:

@testitem "Allocations" setup=[AllocTest] begin
    using BenchmarkTools
    using .AllocTest: Allocs
    b = @benchmark f($x) samples = 1 evals = 1
    @test t.allocs <= Allocs(100) # changed from t.allocs <= 100
end
1 Like