Analysis of PkgEval results for upcoming 1.6

kristoffer.carlsson · January 12, 2021, 12:05pm

The reason for this post is to describe the way PkgEval is used for new Julia releases and some analysis of the failures for the upcoming 1.6 release.

As always, when preparing for a new release, we run something called PkgEval which is a tool that
runs the tests for all packages on the previous release and the upcoming release and tries to find regressions in Julia that
causes package tests to fail. A typical report from PkgEval (which is the latest one comparing v1.6-beta and v1.5) can be found here.

Each entry in this report is a package with a failing test on the upcoming release but not the latest release.
The log for each test run is linked and going through these logs to find julia regressions is one of the big work tasks for a new release.

To speed up this process I have a notebook with corresponding Project/Manifest() that does some regex parsing to categorize these logs. This notebook needs the corresponding data.tar.xz that comes with the PkgEval report. That file just needs to be put in the same folder as the notebook.

Before a release, I typically go through the logs and find different problems. Some reoccurring categories are:

Bugs introduced in Julia (for example: https://github.com/JuliaLang/julia/issues/38422, https://github.com/JuliaLang/julia/issues/37458, https://github.com/JuliaLang/julia/issues/38426, etc)
Packages relying on internals or things that are allowed to change between Julia versions (for example: https://github.com/m-wells/AlignedBinaryFormat.jl/issues/3, https://github.com/wookay/LOLTools.jl/issues/3)
Testing the exact printing of Julia types ("Dict{Int64,Int64}" on 1.5 vs "Dict{Int64, Int64}" on 1.6).

After finding a potential issue, there are a few things that can be done:

Make a PR to the package/julia repo fixing it
Open an issue at the package/julia repo describing the problem

If possible, creating MWEs are very useful.

As a package author, fixing the issue and making a new release is very useful because then the package will no longer show up in the later PkgEval runs and that reduces noise (for me).

For the upcoming 1.6 release, there was quite a log of changes to printing which caused many packages to start failing. For example, BenchmarkTools.jl fails with:

Test Failed at /home/pkgeval/.julia/packages/BenchmarkTools/eCEpo/test/TrialsTests.jl:217
  Expression: sprint(show, "text/plain", [ta, tb]) == "2-element Array{BenchmarkTools.TrialEstimate,1}:\n 0.490 ns\n 1.000 ns"
   Evaluated: "2-element Vector{BenchmarkTools.TrialEstimate}:\n 0.490 ns\n 1.000 ns" ==
              "2-element Array{BenchmarkTools.TrialEstimate,1}:\n 0.490 ns\n 1.000 ns"

Inadvertently, this test tests how an Array type is printed (which got changed in 1.6 to print as Vector when applicable). A tip is to interpolate the type itself into the string like:

 "2-element $(Array{BenchmarkTools.TrialEstimate,1}):\n 0.490 ns\n 1.000 ns"

The current PkgEval run shows 356 packages that fail on 1.6 vs 1.5. This is quite high but a lot of these are from the aformentioned printing changes. Below are two (hidden) lists of packages that are now failing (click to expand):

A list of packages that are determined to likely fail due to printing changes are:

AMLPipelineBase
AbstractTrees
Altro
AstroLib
BenchmarkTools
BinningAnalysis
BioAlignments
BlockArrays
CBinding
CategoricalArrays
ChainRulesTestUtils
ClinicalTrialUtilities
ComponentArrays
Compose
ContinuousTimeMarkov
CorticalSpectralTemporalResponses
CustomUnitRanges
DarkIntegers
DataAPI
DefaultArrays
DimensionalData
Divergences
DocStringExtensions
DomainSets
Dualization
DumbCompleter
DynamicSparseArrays
EclipsingBinaryStars
FieldProperties
FillArrays
FourierFlows
FreeParameters
GLM
Graphene
Gtk
HierarchicalUtils
HypothesisTests
ISAData
IndentWrappers
IndexedTables
InfiniteArrays
InfiniteOpt
Intervals
JsonGrinder
JuMP
LambertW
LazyArrays
LinearMapsAA
MDTable
MLJGLMInterface
MLJLinearModels
MLJModelInterface
MLJModels
MLJMultivariateStatsInterface
MLJScikitLearn
MLJScikitLearnInterface
MPIFiles
ManifoldsBase
Markovify
MatrixFactorizations
Measurements
MenuAdventures
Metrics
MicroLogging
Missings
ModelBaseEcon
MolecularGraph
Multigraphs
NaiveGAflux
NamedDims
NetworkDynamics
NeuroCore
NicePipes
NumericIO
OpenQuantumBase
OpenStreetMapX
OrthogonalPolynomialsQuasi
Oscar
POMDPModelTools
POMDPPolicies
Petri
PhyloModels
Pluto
PotentialFlow
PrettyPrint
ProximalBase
Quadmath
QuantumLattices
QuantumOpticsBase
RLEVectors
RandomBasedArrays
RandomMatrices
Ranges
Recommendation
RegularizationTools
Santiago
Sched
SimpleHypergraphs
SparseTimeSeries
SparsityDetection
StaticNumbers
StructArrays
StructJuMP
SymPy
TaylorSeries
TerminalLoggers
Thorn
TimeSeries
TimeToLive
TimesDates
Traceur
TrackedDistributions
Tracker
Transformers
UnitTestDesign
WebSockets
YaoArrayRegister
Zeros

Packages where it is unclear why they fail (none of my regexes have matched) are::

AWSCore
Autologistic
AxisKeys
BPFnative
BlobTracking
Bukdu
CDDLib
CMPlot
Cambrian
Circuitscape
ClimateBase
CloudWatchLogs
CombinedParsers
ComoniconGUI
DIVAnd
DataAssim
DensityRatioEstimation
DiffEqOperators
Dispersal
DynamicPolynomials
EmojiSymbols
Enzyme
ExaPF
ExtremeStats
FTPClient
FWFTables
FlashWeave
Flux
GEOTRACES
GalacticOptim
Gridap
GridapEmbedded
GridapODEs
GtkReactive
H3
HalfIntegerArrays
IPython
ImageInpainting
JLLWrappers
JetPackDSP
Jive
LOLTools
Laplacians
Libtask
LocalRegistry
MCAnalyzer
MCMCChain
MCMCChains
MGVI
Mads
MakieGallery
MathOptInterface
Millboard
Minesweeper
MultivariatePolynomials
MutableArithmetics
NCDatasets
NMFk
NamedPlus
NeXLSpectrum
NetCDF
NeuralNetDiffEq
NeuralPDE
OceanTurb
Oceananigans
OhMyREPL
OpenQuantumTools
OptimKit
Plotly
Polyhedra
PowerModels
PrincipalMomentAnalysisApp
ProbabilisticCircuits
Probably
ProximalOperators
QuantLib
Quante
RBNF
ReinforcementLearningZoo
RoME
ShallowWaters
SparseDiffTools
Stan
StochasticDiffEq
StructViews
StructuredOptimization
SystemBenchmark
TSne
Tar
ThreePhasePowerModels
Trixi
TuringModels
VideoIO
YaoBlocks
YaoExtensions

If someone feels like helping out for 1.6, it’s quite easy to go make a PR that fixes the printing of a package test on 1.6 or check the logs for one of the packages with an unknown failure and figure out what the cause is, etc.

Ideally, we want to fix as many of these as possible before the real 1.6 release.

andreasnoack · January 12, 2021, 2:10pm

Great work. If anybody would like to contribute some functionality then I think it would be really useful if you could see the list of failing packages sorted by the number of dependencies since it is useful when prioritizing which packages to fix first.

DrChainsaw · January 12, 2021, 7:44pm

In case it is meaningful to filter out individuals, NaiveGAflux seems to fail due to new random sequences.

Before anyone jumps to scold me for this, it is a short test suite for the examples in the readme where I didn’t want the clutter from a stubbed RNG (e.g. StableRNGs). I guess there are better options, but so far changing a handful of numbers every other julia release has proven to be less of a hassle than figuring out something better.

If this causes noise/friction in the PkgEval process I’ll try to come up with a better way.

tlienart · January 13, 2021, 9:09am

This is probably fairly niche but something (I hope it’s the only thing) that made a bunch of MLJ* repos fail on >=1.6- is a typename story, consider

function coretype(M)
    if isdefined(M, :name)
        return M.name
    else
        return coretype(M.body)
    end
end

then

df = DataFrames.DataFrame(a=[1,2,3])
string(M.coretype(typeof(df))) # "DataFrame"  for <=1.6-
string(M.coretype(typeof(df))) # "typename(DataFrame)" for >= 1.6-

kristoffer.carlsson · January 13, 2021, 9:35am

I would say that faills into the class of using internals, here accessing internal fields of type objects.

Topic		Replies	Views
Regressions in Julia not getting caught by PkgEval General Usage pkgeval	8	227	October 9, 2024
Questions about testing and PkgEval, e.g. why only 63% of ecosystem tested, rest "skipped"? General Usage	1	395	May 1, 2023
JuliaHub is discrediting packages Package Management pkgeval	8	901	April 19, 2022
ANN: Nanosoldier package evaluation -- with badges! Tooling announcement , pkgeval	6	1772	February 7, 2020
What to do when packages fail PkgEval because of JET? Tooling pkgeval , jet	15	499	April 16, 2025

Analysis of PkgEval results for upcoming 1.6

Related topics