Tests fail on CI, pass locally

ufechner7 · January 23, 2025, 5:52pm

The tests for the following pull requests pass locally, but fail on Github when using Julia 1.11 : Kcu steering · ufechner7/KiteModels.jl@5cd060d · GitHub

Any idea how to debug that?

The problem is that when running the CI I get a high number of allocations, which I do not get locally. What could be the reason?

test/bench4.jl:127
  Expression: t.memory <= 208
   Evaluated: 1232 <= 208

giordano · January 23, 2025, 6:06pm

did you see the message

│ To reproduce this CI run locally run the following from the same repository state on julia version 1.11.3:
│
│ `import Pkg; Pkg.test(;coverage=true, julia_args=["--check-bounds=yes", "--compiled-modules=yes", "--depwarn=yes"], force_latest_compatible_version=false, allow_reresolve=true)`

?

if that’s not sufficient, you can always log into the GitHub hosted runners with GitHub - mxschmitt/action-tmate: Debug your GitHub Actions via SSH by using tmate to get access to the runner system itself.

ufechner7 · January 27, 2025, 10:35am

I was able to reproduce the problem (too many allocations) using this command line locally.

But how can I fix it? Can I change the parameters of Pkg.test() in the CI? If yes, how?

And why is this problem new and did not happen before? Was there a change in the CI parameters lately, or does Julia 11.3 behave differently than Julia 11.2 ?

UPDATE:
The parameter coverage=true causes extra allocations that make my test fail. This is not the case with Julia 1.10, for example.

ufechner7 · January 27, 2025, 11:35am

I fixed the failures using code like this:

if VERSION.minor == 11
    # the higher allocation happens only when testing with "coverage=true"
    @test t.memory <= 1232
else
    @test t.memory <= 128
end

It does not feel like a good solution, though.

is there a way to test inside of a test the value of the parameter coverage?
why does this problem appear with Julia 1.11 and not with Julia 1.10?

giordano · January 27, 2025, 12:01pm

Base.JLOptions().code_coverage

but note that’s an internal/private/udocumented function, it may break at any point, don’t say I didn’t warn you.

Code coverage has side effects which can change from version to version and there’s no guarantee about memory usage. Feels like your test is very flaky.

ufechner7 · January 27, 2025, 12:17pm

Well, if code coverage has side effects it should be well documented if it is true or false.

How shall I ensure low/ no memory allocations without testing them?

giordano · January 27, 2025, 12:24pm

Code coverage generates files which wouldn’t be there otherwise, that’s pretty much a side effect by definition.

ufechner7 · January 27, 2025, 12:35pm

I created a feature request: Provide a public function like Base.JLOptions().code_coverage · Issue #57168 · JuliaLang/julia · GitHub

lmiq · January 30, 2025, 8:10pm

I’m reviving this thread because I just started to experience this issue.

It stared to happen, apparently, with 1.11.3 only.

The issue is that tests that probe for allocations started to fail because of the code-coverage. I’m not sure, but I think that CI testing and reporting should be run separately from the tests that are used to compute coverage?

I don’t feel that checking within the test if coverage is being used or not is the issue, but more that coverage and actual unit testing should not be part of the same CI run.

It is my impression that many people use allocation tests, and those will start failing now?

Thoughts?

giordano · January 30, 2025, 8:32pm

If you’re saying that you didn’t observe this situation in v1.11.2 then it shouldn’t be hard to run git bisect to see what caused the difference.

lmiq · January 31, 2025, 1:35pm

bdf8219ee80557ea6035b421b00d91b1174234f2 is the first bad commit
commit bdf8219ee80557ea6035b421b00d91b1174234f2
Author: Jameson Nash <vtjnash@gmail.com>
Date:   Mon Dec 9 16:41:30 2024 -0500

    precompile: don't waste memory on useless inferred code (#56749)
    
    We never have a reason to reference this data again since we already
    have native code generated for it, so it is simply wasting memory and
    download space.
    
    $ du -sh {old,new}/usr/share/julia/compiled
    256M old
    227M new
    
    (cherry picked from commit dfe6a13e5038c8cbe0f1720d190629225ec1a19b)

 base/compiler/effects.jl |  1 +
 src/codegen.cpp          |  8 ++++----
 src/staticdata.c         | 30 ++++++++++++++++++++++++++++--
 3 files changed, 33 insertions(+), 6 deletions(-)

This is the test:

import Pkg
Pkg.add(url="https://github.com/lmiq/MeuNovoPacote.jl")
Pkg.test("MeuNovoPacote"; coverage=true)

Issue reported here: allocation tests fail with coverage=true after https://github.com/JuliaLang/julia/commit/dfe6a13e5038c8cbe0f1720d190629225ec1a19b · Issue #57220 · JuliaLang/julia · GitHub

nsajko · January 31, 2025, 2:44pm

Copying my answer from the Github issue:

Is there a workaround? Some way to get the coverage report but in parallel run CI tests with the same compiler options that we use for production? (and thus, the ones which we test locally)

FixedSizeArrays works around this issue by splitting CI into coverage-enabled and coverage-disabled jobs. CI informs the test suite that “the build is a production build” when coverage is disabled, thus enabling the no-allocation tests:

The test suite needs to parse the environment variable:

github.com/JuliaArrays/FixedSizeArrays.jl

test/runtests.jl

d17373edf


      
          # Check if the compilation options allow maximum performance.
          const build_is_production_build_env_name = "BUILD_IS_PRODUCTION_BUILD"
          const build_is_production_build = let v = get(ENV, build_is_production_build_env_name, "true")
              if v ∉ ("false", "true")
                  error("unknown value for environment variable $build_is_production_build_env_name: $v")
              end
              if v == "true"
                  true
              else
                  false
              end
          end::Bool

The CI is configured like so:

github.com/JuliaArrays/FixedSizeArrays.jl

.github/workflows/UnitTests.yml

d17373edf


      
          jobs:
            test:
              name: Julia ${{ matrix.julia_version }} - ${{ matrix.os }} - ${{ matrix.julia_arch }}
              timeout-minutes: 20
              strategy:
                fail-fast: false
                matrix:
                  julia_version:
                    - "1.11"
                    - "nightly"
                  julia_arch:
                    - x64
                  os:
                    - ubuntu-latest
                    - macos-latest
                    - windows-latest
                  build_is_production_build:
                    - true
              runs-on: ${{ matrix.os }}
              steps:

This file has been truncated. show original

Perhaps it’d be good if a Github Action was available to make this easier.

lmiq · January 31, 2025, 3:00pm

Does it make sense to run tests with coverage in all platforms and/or Julia versions?

Running on a single platform or versions will produce worse coverage reports in any sense?

nsajko · January 31, 2025, 3:03pm

As shown in the above YAML file, FixedSizeArrays.jl has the coverage-enabled jobs run only on x64 Ubuntu. This is OK because there’s no platform-specific source code. That said, it makes sense to run on multiple Julia versions to increase the coverage a bit, as the reports get merged, and Julia’s imperfect coverage implementation varies from release to release.

lmiq · January 31, 2025, 3:32pm

FWIW, I opted (for now) to skip the allocation tests if the build run option is set to false, and run them otherwise.

Did that just be defining the following function for my allocation tests:

    function test_allocs(allocs, max_allocs)
        if haskey(ENV, "BUILD_IS_PRODUCTION_BUILD") && ENV["BUILD_IS_PRODUCTION_BUILD"] == "false"
            true
        else
            allocs <= max_allocs
        end
    end

and setting up the CI run accordingly here following your example.

An update, for anyone interested. I use the TestItems.jl / TestItemRunner.jl framework for testing. Thus, to minimally alter my allocation tests, I have defined:

@testmodule AllocTest begin
    # This module defines the Allocs struct and the comparison operators
    # to conditionally compare the number of allocations based on the
    # BUILD_IS_PRODUCTION_BUILD environment variable.
    export Allocs
    @kwdef struct Allocs
        prodbuild::Bool = haskey(ENV, "BUILD_IS_PRODUCTION_BUILD") && ENV["BUILD_IS_PRODUCTION_BUILD"] == "true"
        allocs::Int
    end
    Allocs(allocs::Int) = Allocs(; allocs)
    import Base: ==, >, <
    ==(a::Int, b::Allocs) = b.prodbuild ? a == b.allocs : true
    <(a::Int, b::Allocs) = b.prodbuild ? a < b.allocs : true
    ==(a::Allocs, b::Int) = a.prodbuild ? a.allocs == b : true
    <(a::Allocs, b::Int) = a.prodbuild ? a.allocs < b : true
end

and then allocation tests can be modified by simply changing direct previous comparison with the comparison with the Allocs object, for example:

@testitem "Allocations" setup=[AllocTest] begin
    using BenchmarkTools
    using .AllocTest: Allocs
    b = @benchmark f($x) samples = 1 evals = 1
    @test t.allocs <= Allocs(100) # changed from t.allocs <= 100
end

Topic		Replies	Views
How do I get Test Coverage not to mess up @allocated calculations Profiling package , coverage , allocated	3	79	April 27, 2025
Add testing for memory allocations Performance memory-allocation	13	1226	July 27, 2021
How to detect whether the build (compiler options) is a production build General Usage question , testing , performance , coverage , ci	3	208	May 5, 2024
Tracking memory usage in unit tests (a lot worse in Julia 0.7 than 0.6) General Usage	28	3721	September 11, 2019
Is it possible to add reliable tests that functions do not allocate? Performance testing , memory-allocation	4	377	August 5, 2022

Tests fail on CI, pass locally

Related topics