Questions about testing and PkgEval, e.g. why only 63% of ecosystem tested, rest "skipped"?

In short is PkgEval/master in worse state than usual? Before people start panic, the process seems to be working as intended, except maybe the skipped packages. I’m still curious about some of the (other) failures I’m seeing.

I’m seeing this recently, may I just overlooked before, and this isn’t new. I like how 100% of the registered package system is tested (well we have over 9300 now, so seemingly about all non-JLLs packages), or at least attempted.

In total, 8013 packages were tested, out of which 4823 succeeded, 23 crashed, 200 failed and 2967 were skipped.

2955 packages were skipped on the previous version too.

Thereof:

Package was blacklisted (2952 packages):

Some of the most important are skipped (intentionally it seems “blacklisted”), e.g. DataStructures, HTTP, DataFrames, LoopVectorization, CSV, JLD2, ImageCore, CUDA, JSON3, JuMP.

I don’t see any obvious reason or something relating all of these packages, so what does make a package be skipped (is it not them blacklisted directly, rather related to some blacklisted dependency, Artifacts?)?

  • SIMDMath v0.2.5: fail vs. ok

https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/408b4ac_vs_b12ddca/SIMDMath.primary.log
complex: Test Failed at /home/pkgeval/.julia/packages/SIMDMath/Dn7RO/test/complex_test.jl:81
Expression: (s[1]).re == e.re
Evaluated: -2397.529479230523 == -2397.5294792305203
[…]
Evaluated: -257.8749055869241 == -257.87490558692

While:

julia> -2397.529479230523 ≈ -2397.5294792305203
true

julia> -2547.570474265955 ≈ -2547.5704742659523
true

Wouldn’t you rather want ≈ there in most tests, and other packages with float math? I think someone should make a PR, maybe even me. Only in rare cases you want exact comparison for floats, and not there.

14 packages crashed during testing on the previous version too.

e.g. important I want tested: Revise v3.5.2 and StaticCompiler v0.4.9

173 packages failed tests on the previous version too.
Package has test failures (83 packages):

Some of those there I want tested and working:

Also Nullables.

The process was aborted (6 packages):

  • StaticTools v0.8.7: crash vs. ok
    […]

A segmentation fault happened (3 packages):

Many of those are strange (why fail before, but not much of a worry since with the PR now works “ok”):

29 packages passed tests only on the current version.

MbedTLS v1.1.7: ok vs. fail
[…]

Ironically, testing the package Test failed (“became inactive”, was “ok” before):

Mock for testing retval of record(::DefaultTestSet, ::T <: Result) methods: Error During Test at An Error Mock:0

Mock for testing retval of record(::DefaultTestSet, ::T <: Result) methods: Test Failed at A Fail Mock:0
Expression: 1
Evaluated: 2

Stacktrace:
[1] record(ts::Test.DefaultTestSet, t::Union{Test.Error, Test.Fail}; print_result::Bool)

The process was aborted (6 packages):
StaticTools v0.8.7: crash vs. ok
[…]
A segmentation fault happened (3 packages):
GPUCompiler v0.19.3: crash vs. ok
RData v1.0.0: crash vs. ok

2 Likes

Those skipped tests are packages that are known to have unreliable tests, mostly due to depending on network. Others depend on internals, PkgEval is the first line for checking regressions, the other is packages runnings their own tests on nighly builds and reporting other possible bugs.

1 Like