I wonder if anybody has good recommendations for debugging the latency of using
statements. The new @time_imports
already helps quite a lot but I haven’t found a workflow I really like. Here is an example of what I mean:
With Julia v1.9.0-beta2 and the current release of Trixi.jl, I get (when starting Julia in the project folder of Trixi.jl)
julia> @time_imports using Trixi
0.2 ms Reexport
1.0 ms ConstructionBase
0.1 ms SnoopPrecompile
19.9 ms RecipesBase
1.1 ms DocStringExtensions
2.2 ms StaticArraysCore
0.6 ms Statistics
0.1 ms SuiteSparse
1.7 ms ArrayInterfaceCore
0.3 ms SymbolicIndexingInterface
0.1 ms Compat
58.1 ms ChainRulesCore
9.4 ms MacroTools
1.1 ms ZygoteRules
0.3 ms Adapt
0.3 ms ArrayInterfaceStaticArraysCore
140.3 ms FillArrays
0.1 ms DataValueInterfaces
0.6 ms DataAPI
0.1 ms IteratorInterfaceExtensions
0.1 ms TableTraits
5.4 ms OrderedCollections
10.8 ms Tables
1.5 ms GPUArraysCore
22.7 ms RecursiveArrayTools
18.2 ms Preferences
0.2 ms CommonSolve
9.2 ms FunctionWrappers
0.4 ms FunctionWrappersWrappers
0.2 ms ExprTools
0.5 ms RuntimeGeneratedFunctions
0.2 ms EnumX
152.1 ms SciMLBase
9.5 ms CodeTracking
0.1 ms IfElse
35.5 ms Static
4.1 ms ArrayInterface
0.2 ms EllipsisNotation
4.1 ms IrrationalConstants
0.4 ms DiffRules
1.7 ms DiffResults
434.2 ms StaticArrays
0.2 ms OpenLibm_jll
0.2 ms NaNMath
0.7 ms ChangesOfVariables
0.7 ms InverseFunctions
0.4 ms LogExpFunctions
0.2 ms JLLWrappers
3.6 ms OpenSpecFun_jll 92.31% compilation time
14.2 ms SpecialFunctions
0.3 ms CommonSubexpressions
75.9 ms ForwardDiff
0.3 ms Requires
0.2 ms Zlib_jll
0.5 ms OpenSSL_jll
0.6 ms HDF5_jll
50.4 ms HDF5
110.2 ms LinearMaps
37.9 ms OffsetArrays
0.7 ms ArrayInterfaceOffsetArrays
0.8 ms ArrayInterfaceStaticArrays
0.2 ms SIMDTypes
1.2 ms ManualMemory
3.4 ms LayoutPointers
0.6 ms CpuId
97.5 ms CPUSummary 95.06% compilation time
0.2 ms BitTwiddlingConvenienceFunctions
10.5 ms HostCPUFeatures
372.0 ms VectorizationBase
3.2 ms SLEEFPirates
0.1 ms UnPack
7.3 ms ThreadingUtilities
4.0 ms PolyesterWeave
2.9 ms CloseOpenIntervals
2.5 ms SIMDDualNumbers
176.8 ms LoopVectorization
0.2 ms MPIPreferences
0.9 ms MPICH_jll
189.6 ms MPI 81.86% compilation time
0.2 ms MuladdMacro
30.8 ms StructArrays
23.8 ms IterTools
1.7 ms Extents
0.6 ms GeoInterface
0.3 ms EarCut_jll
388.9 ms GeometryBasics
1652.9 ms Octavian 82.96% compilation time (68% recompilation)
166.1 ms StrideArraysCore
1.3 ms Polyester
4.0 ms CEnum
0.6 ms P4est_jll
34.3 ms P4est
18.4 ms Setfield
131.8 ms VectorizedRNG
1166.4 ms VectorizedStatistics
65.8 ms StrideArrays
45.5 ms TimerOutputs
0.3 ms Triangle_jll
0.4 ms Triangulate
1.6 ms TriplotBase
54.2 ms FixedPointNumbers
98.2 ms ColorTypes 17.30% compilation time (86% recompilation)
197.7 ms Colors
0.3 ms TensorCore
168.5 ms ColorVectorSpace 3.58% compilation time (91% recompilation)
6.9 ms ColorSchemes
290.0 ms PlotUtils
1.6 ms TriplotRecipes
0.2 ms AutoHashEquals
0.3 ms ArgCheck
8.9 ms AbstractFFTs
0.5 ms FFTW_jll
229.6 ms FFTW
9.4 ms Unrolled
0.4 ms FastGaussQuadrature
2.9 ms PolynomialBases
841.9 ms SummationByPartsOperators 106.52% compilation time
139.1 ms ComponentArrays 60.30% compilation time
86.4 ms DataStructures
0.3 ms SortingAlgorithms
10.4 ms Missings
0.2 ms StatsAPI
30.7 ms StatsBase
0.9 ms CovarianceEstimation
80.4 ms NamedDims
86.2 ms Kronecker
0.8 ms DelimitedFiles
1.3 ms NodesAndModes
3.2 ms PathIntersections
20.9 ms StartUpDG
1173.3 ms Trixi 63.70% compilation time
Of all of these dependencies, Octavian.jl takes the biggest time alone and comes with quite a lot of recompilation:
1652.9 ms Octavian 82.96% compilation time (68% recompilation)
Starting the same version of Julia again with the same project, I get
julia> @time_imports using Octavian
0.2 ms Requires
0.2 ms SuiteSparse
1.9 ms ArrayInterfaceCore
0.1 ms IfElse
32.3 ms Static
0.1 ms Compat
3.5 ms ArrayInterface
0.2 ms Adapt
40.7 ms OffsetArrays
0.6 ms ArrayInterfaceOffsetArrays
0.6 ms Statistics
2.2 ms StaticArraysCore
420.0 ms StaticArrays
0.3 ms ArrayInterfaceStaticArraysCore
7.2 ms ArrayInterfaceStaticArrays
0.3 ms SIMDTypes
1.1 ms ManualMemory
3.2 ms LayoutPointers
0.5 ms CpuId
94.5 ms CPUSummary 98.02% compilation time
0.2 ms BitTwiddlingConvenienceFunctions
11.3 ms HostCPUFeatures
337.0 ms VectorizationBase
2.9 ms SLEEFPirates
0.1 ms UnPack
6.5 ms ThreadingUtilities
4.4 ms PolyesterWeave
1.1 ms DocStringExtensions
3.2 ms CloseOpenIntervals
0.1 ms SnoopPrecompile
65.6 ms ChainRulesCore
4.3 ms IrrationalConstants
0.5 ms DiffRules
1.8 ms DiffResults
16.2 ms Preferences
0.2 ms OpenLibm_jll
0.2 ms NaNMath
0.7 ms ChangesOfVariables
0.6 ms InverseFunctions
0.4 ms LogExpFunctions
0.2 ms JLLWrappers
2.8 ms OpenSpecFun_jll 89.06% compilation time
14.7 ms SpecialFunctions
13.2 ms MacroTools
0.3 ms CommonSubexpressions
100.2 ms ForwardDiff
2.5 ms SIMDDualNumbers
164.1 ms LoopVectorization
571.8 ms Octavian 25.33% compilation time
This is quite a reduction and not a significant amount of recompilation. So I would expect that invalidations from loading other packages earlier when using Trixi
trigger the recompilation.
- What is the best way to figure out which package/invalidations are causing this slowdown?
- What is the best way to figure out what is causing the high compilation times of some packages?