Analysis of PkgEval results for upcoming 1.6

The reason for this post is to describe the way PkgEval is used for new Julia releases and some analysis of the failures for the upcoming 1.6 release.

As always, when preparing for a new release, we run something called PkgEval which is a tool that
runs the tests for all packages on the previous release and the upcoming release and tries to find regressions in Julia that
causes package tests to fail. A typical report from PkgEval (which is the latest one comparing v1.6-beta and v1.5) can be found here.

Each entry in this report is a package with a failing test on the upcoming release but not the latest release.
The log for each test run is linked and going through these logs to find julia regressions is one of the big work tasks for a new release.

To speed up this process I have a notebook with corresponding Project/Manifest() that does some regex parsing to categorize these logs. This notebook needs the corresponding data.tar.xz that comes with the PkgEval report. That file just needs to be put in the same folder as the notebook.

Before a release, I typically go through the logs and find different problems. Some reoccurring categories are:

After finding a potential issue, there are a few things that can be done:

  • Make a PR to the package/julia repo fixing it
  • Open an issue at the package/julia repo describing the problem

If possible, creating MWEs are very useful.

As a package author, fixing the issue and making a new release is very useful because then the package will no longer show up in the later PkgEval runs and that reduces noise (for me).

For the upcoming 1.6 release, there was quite a log of changes to printing which caused many packages to start failing. For example, BenchmarkTools.jl fails with:

Test Failed at /home/pkgeval/.julia/packages/BenchmarkTools/eCEpo/test/TrialsTests.jl:217
  Expression: sprint(show, "text/plain", [ta, tb]) == "2-element Array{BenchmarkTools.TrialEstimate,1}:\n 0.490 ns\n 1.000 ns"
   Evaluated: "2-element Vector{BenchmarkTools.TrialEstimate}:\n 0.490 ns\n 1.000 ns" ==
              "2-element Array{BenchmarkTools.TrialEstimate,1}:\n 0.490 ns\n 1.000 ns"

Inadvertently, this test tests how an Array type is printed (which got changed in 1.6 to print as Vector when applicable). A tip is to interpolate the type itself into the string like:

 "2-element $(Array{BenchmarkTools.TrialEstimate,1}):\n 0.490 ns\n 1.000 ns"

The current PkgEval run shows 356 packages that fail on 1.6 vs 1.5. This is quite high but a lot of these are from the aformentioned printing changes. Below are two (hidden) lists of packages that are now failing (click to expand):

A list of packages that are determined to likely fail due to printing changes are:

AMLPipelineBase
AbstractTrees
Altro
AstroLib
BenchmarkTools
BinningAnalysis
BioAlignments
BlockArrays
CBinding
CategoricalArrays
ChainRulesTestUtils
ClinicalTrialUtilities
ComponentArrays
Compose
ContinuousTimeMarkov
CorticalSpectralTemporalResponses
CustomUnitRanges
DarkIntegers
DataAPI
DefaultArrays
DimensionalData
Divergences
DocStringExtensions
DomainSets
Dualization
DumbCompleter
DynamicSparseArrays
EclipsingBinaryStars
FieldProperties
FillArrays
FourierFlows
FreeParameters
GLM
Graphene
Gtk
HierarchicalUtils
HypothesisTests
ISAData
IndentWrappers
IndexedTables
InfiniteArrays
InfiniteOpt
Intervals
JsonGrinder
JuMP
LambertW
LazyArrays
LinearMapsAA
MDTable
MLJGLMInterface
MLJLinearModels
MLJModelInterface
MLJModels
MLJMultivariateStatsInterface
MLJScikitLearn
MLJScikitLearnInterface
MPIFiles
ManifoldsBase
Markovify
MatrixFactorizations
Measurements
MenuAdventures
Metrics
MicroLogging
Missings
ModelBaseEcon
MolecularGraph
Multigraphs
NaiveGAflux
NamedDims
NetworkDynamics
NeuroCore
NicePipes
NumericIO
OpenQuantumBase
OpenStreetMapX
OrthogonalPolynomialsQuasi
Oscar
POMDPModelTools
POMDPPolicies
Petri
PhyloModels
Pluto
PotentialFlow
PrettyPrint
ProximalBase
Quadmath
QuantumLattices
QuantumOpticsBase
RLEVectors
RandomBasedArrays
RandomMatrices
Ranges
Recommendation
RegularizationTools
Santiago
Sched
SimpleHypergraphs
SparseTimeSeries
SparsityDetection
StaticNumbers
StructArrays
StructJuMP
SymPy
TaylorSeries
TerminalLoggers
Thorn
TimeSeries
TimeToLive
TimesDates
Traceur
TrackedDistributions
Tracker
Transformers
UnitTestDesign
WebSockets
YaoArrayRegister
Zeros

Packages where it is unclear why they fail (none of my regexes have matched) are::

AWSCore
Autologistic
AxisKeys
BPFnative
BlobTracking
Bukdu
CDDLib
CMPlot
Cambrian
Circuitscape
ClimateBase
CloudWatchLogs
CombinedParsers
ComoniconGUI
DIVAnd
DataAssim
DensityRatioEstimation
DiffEqOperators
Dispersal
DynamicPolynomials
EmojiSymbols
Enzyme
ExaPF
ExtremeStats
FTPClient
FWFTables
FlashWeave
Flux
GEOTRACES
GalacticOptim
Gridap
GridapEmbedded
GridapODEs
GtkReactive
H3
HalfIntegerArrays
IPython
ImageInpainting
JLLWrappers
JetPackDSP
Jive
LOLTools
Laplacians
Libtask
LocalRegistry
MCAnalyzer
MCMCChain
MCMCChains
MGVI
Mads
MakieGallery
MathOptInterface
Millboard
Minesweeper
MultivariatePolynomials
MutableArithmetics
NCDatasets
NMFk
NamedPlus
NeXLSpectrum
NetCDF
NeuralNetDiffEq
NeuralPDE
OceanTurb
Oceananigans
OhMyREPL
OpenQuantumTools
OptimKit
Plotly
Polyhedra
PowerModels
PrincipalMomentAnalysisApp
ProbabilisticCircuits
Probably
ProximalOperators
QuantLib
Quante
RBNF
ReinforcementLearningZoo
RoME
ShallowWaters
SparseDiffTools
Stan
StochasticDiffEq
StructViews
StructuredOptimization
SystemBenchmark
TSne
Tar
ThreePhasePowerModels
Trixi
TuringModels
VideoIO
YaoBlocks
YaoExtensions

If someone feels like helping out for 1.6, it’s quite easy to go make a PR that fixes the printing of a package test on 1.6 or check the logs for one of the packages with an unknown failure and figure out what the cause is, etc.

Ideally, we want to fix as many of these as possible before the real 1.6 release.

33 Likes

Great work. If anybody would like to contribute some functionality then I think it would be really useful if you could see the list of failing packages sorted by the number of dependencies since it is useful when prioritizing which packages to fix first.

2 Likes

In case it is meaningful to filter out individuals, NaiveGAflux seems to fail due to new random sequences.

Before anyone jumps to scold me for this, it is a short test suite for the examples in the readme where I didn’t want the clutter from a stubbed RNG (e.g. StableRNGs). I guess there are better options, but so far changing a handful of numbers every other julia release has proven to be less of a hassle than figuring out something better.

If this causes noise/friction in the PkgEval process I’ll try to come up with a better way.

1 Like

This is probably fairly niche but something (I hope it’s the only thing) that made a bunch of MLJ* repos fail on >=1.6- is a typename story, consider

function coretype(M)
    if isdefined(M, :name)
        return M.name
    else
        return coretype(M.body)
    end
end

then

df = DataFrames.DataFrame(a=[1,2,3])
string(M.coretype(typeof(df))) # "DataFrame"  for <=1.6-
string(M.coretype(typeof(df))) # "typename(DataFrame)" for >= 1.6-

I would say that faills into the class of using internals, here accessing internal fields of type objects.

1 Like