What is the recommended way to debug `using` latency issues?

I wonder if anybody has good recommendations for debugging the latency of using statements. The new @time_imports already helps quite a lot but I haven’t found a workflow I really like. Here is an example of what I mean:

With Julia v1.9.0-beta2 and the current release of Trixi.jl, I get (when starting Julia in the project folder of Trixi.jl)

julia> @time_imports using Trixi
      0.2 ms  Reexport
      1.0 ms  ConstructionBase
      0.1 ms  SnoopPrecompile
     19.9 ms  RecipesBase
      1.1 ms  DocStringExtensions
      2.2 ms  StaticArraysCore
      0.6 ms  Statistics
      0.1 ms  SuiteSparse
      1.7 ms  ArrayInterfaceCore
      0.3 ms  SymbolicIndexingInterface
      0.1 ms  Compat
     58.1 ms  ChainRulesCore
      9.4 ms  MacroTools
      1.1 ms  ZygoteRules
      0.3 ms  Adapt
      0.3 ms  ArrayInterfaceStaticArraysCore
    140.3 ms  FillArrays
      0.1 ms  DataValueInterfaces
      0.6 ms  DataAPI
      0.1 ms  IteratorInterfaceExtensions
      0.1 ms  TableTraits
      5.4 ms  OrderedCollections
     10.8 ms  Tables
      1.5 ms  GPUArraysCore
     22.7 ms  RecursiveArrayTools
     18.2 ms  Preferences
      0.2 ms  CommonSolve
      9.2 ms  FunctionWrappers
      0.4 ms  FunctionWrappersWrappers
      0.2 ms  ExprTools
      0.5 ms  RuntimeGeneratedFunctions
      0.2 ms  EnumX
    152.1 ms  SciMLBase
      9.5 ms  CodeTracking
      0.1 ms  IfElse
     35.5 ms  Static
      4.1 ms  ArrayInterface
      0.2 ms  EllipsisNotation
      4.1 ms  IrrationalConstants
      0.4 ms  DiffRules
      1.7 ms  DiffResults
    434.2 ms  StaticArrays
      0.2 ms  OpenLibm_jll
      0.2 ms  NaNMath
      0.7 ms  ChangesOfVariables
      0.7 ms  InverseFunctions
      0.4 ms  LogExpFunctions
      0.2 ms  JLLWrappers
      3.6 ms  OpenSpecFun_jll 92.31% compilation time
     14.2 ms  SpecialFunctions
      0.3 ms  CommonSubexpressions
     75.9 ms  ForwardDiff
      0.3 ms  Requires
      0.2 ms  Zlib_jll
      0.5 ms  OpenSSL_jll
      0.6 ms  HDF5_jll
     50.4 ms  HDF5
    110.2 ms  LinearMaps
     37.9 ms  OffsetArrays
      0.7 ms  ArrayInterfaceOffsetArrays
      0.8 ms  ArrayInterfaceStaticArrays
      0.2 ms  SIMDTypes
      1.2 ms  ManualMemory
      3.4 ms  LayoutPointers
      0.6 ms  CpuId
     97.5 ms  CPUSummary 95.06% compilation time
      0.2 ms  BitTwiddlingConvenienceFunctions
     10.5 ms  HostCPUFeatures
    372.0 ms  VectorizationBase
      3.2 ms  SLEEFPirates
      0.1 ms  UnPack
      7.3 ms  ThreadingUtilities
      4.0 ms  PolyesterWeave
      2.9 ms  CloseOpenIntervals
      2.5 ms  SIMDDualNumbers
    176.8 ms  LoopVectorization
      0.2 ms  MPIPreferences
      0.9 ms  MPICH_jll
    189.6 ms  MPI 81.86% compilation time
      0.2 ms  MuladdMacro
     30.8 ms  StructArrays
     23.8 ms  IterTools
      1.7 ms  Extents
      0.6 ms  GeoInterface
      0.3 ms  EarCut_jll
    388.9 ms  GeometryBasics
   1652.9 ms  Octavian 82.96% compilation time (68% recompilation)
    166.1 ms  StrideArraysCore
      1.3 ms  Polyester
      4.0 ms  CEnum
      0.6 ms  P4est_jll
     34.3 ms  P4est
     18.4 ms  Setfield
    131.8 ms  VectorizedRNG
   1166.4 ms  VectorizedStatistics
     65.8 ms  StrideArrays
     45.5 ms  TimerOutputs
      0.3 ms  Triangle_jll
      0.4 ms  Triangulate
      1.6 ms  TriplotBase
     54.2 ms  FixedPointNumbers
     98.2 ms  ColorTypes 17.30% compilation time (86% recompilation)
    197.7 ms  Colors
      0.3 ms  TensorCore
    168.5 ms  ColorVectorSpace 3.58% compilation time (91% recompilation)
      6.9 ms  ColorSchemes
    290.0 ms  PlotUtils
      1.6 ms  TriplotRecipes
      0.2 ms  AutoHashEquals
      0.3 ms  ArgCheck
      8.9 ms  AbstractFFTs
      0.5 ms  FFTW_jll
    229.6 ms  FFTW
      9.4 ms  Unrolled
      0.4 ms  FastGaussQuadrature
      2.9 ms  PolynomialBases
    841.9 ms  SummationByPartsOperators 106.52% compilation time
    139.1 ms  ComponentArrays 60.30% compilation time
     86.4 ms  DataStructures
      0.3 ms  SortingAlgorithms
     10.4 ms  Missings
      0.2 ms  StatsAPI
     30.7 ms  StatsBase
      0.9 ms  CovarianceEstimation
     80.4 ms  NamedDims
     86.2 ms  Kronecker
      0.8 ms  DelimitedFiles
      1.3 ms  NodesAndModes
      3.2 ms  PathIntersections
     20.9 ms  StartUpDG
   1173.3 ms  Trixi 63.70% compilation time

Of all of these dependencies, Octavian.jl takes the biggest time alone and comes with quite a lot of recompilation:

1652.9 ms  Octavian 82.96% compilation time (68% recompilation)

Starting the same version of Julia again with the same project, I get

julia> @time_imports using Octavian
      0.2 ms  Requires
      0.2 ms  SuiteSparse
      1.9 ms  ArrayInterfaceCore
      0.1 ms  IfElse
     32.3 ms  Static
      0.1 ms  Compat
      3.5 ms  ArrayInterface
      0.2 ms  Adapt
     40.7 ms  OffsetArrays
      0.6 ms  ArrayInterfaceOffsetArrays
      0.6 ms  Statistics
      2.2 ms  StaticArraysCore
    420.0 ms  StaticArrays
      0.3 ms  ArrayInterfaceStaticArraysCore
      7.2 ms  ArrayInterfaceStaticArrays
      0.3 ms  SIMDTypes
      1.1 ms  ManualMemory
      3.2 ms  LayoutPointers
      0.5 ms  CpuId
     94.5 ms  CPUSummary 98.02% compilation time
      0.2 ms  BitTwiddlingConvenienceFunctions
     11.3 ms  HostCPUFeatures
    337.0 ms  VectorizationBase
      2.9 ms  SLEEFPirates
      0.1 ms  UnPack
      6.5 ms  ThreadingUtilities
      4.4 ms  PolyesterWeave
      1.1 ms  DocStringExtensions
      3.2 ms  CloseOpenIntervals
      0.1 ms  SnoopPrecompile
     65.6 ms  ChainRulesCore
      4.3 ms  IrrationalConstants
      0.5 ms  DiffRules
      1.8 ms  DiffResults
     16.2 ms  Preferences
      0.2 ms  OpenLibm_jll
      0.2 ms  NaNMath
      0.7 ms  ChangesOfVariables
      0.6 ms  InverseFunctions
      0.4 ms  LogExpFunctions
      0.2 ms  JLLWrappers
      2.8 ms  OpenSpecFun_jll 89.06% compilation time
     14.7 ms  SpecialFunctions
     13.2 ms  MacroTools
      0.3 ms  CommonSubexpressions
    100.2 ms  ForwardDiff
      2.5 ms  SIMDDualNumbers
    164.1 ms  LoopVectorization
    571.8 ms  Octavian 25.33% compilation time

This is quite a reduction and not a significant amount of recompilation. So I would expect that invalidations from loading other packages earlier when using Trixi trigger the recompilation.

  1. What is the best way to figure out which package/invalidations are causing this slowdown?
  2. What is the best way to figure out what is causing the high compilation times of some packages?
2 Likes

Maybe to trigger further answers (since the best way to get good answers in the internet is to post “wrong” answers :wink:): My current workflow is to check each package by hand by restarting Julia and running @time_imports (using SomePackage; using Octavian), where I change the first package and keep the second constant since that’s the one affected by invalidations. Doing this by hand, I found that the culprit is GeometryBasics.jl:

julia> @time_imports (using GeometryBasics; using Octavian)
      0.6 ms  Statistics
      2.4 ms  StaticArraysCore
    444.5 ms  StaticArrays
      0.1 ms  DataValueInterfaces
      0.7 ms  DataAPI
      0.1 ms  IteratorInterfaceExtensions
      0.1 ms  TableTraits
      6.1 ms  OrderedCollections
     11.6 ms  Tables
      0.2 ms  Adapt
      1.7 ms  GPUArraysCore
     22.9 ms  StructArrays
     25.7 ms  IterTools
      2.1 ms  Extents
      0.8 ms  GeoInterface
     28.4 ms  Preferences
      0.3 ms  JLLWrappers
      3.2 ms  EarCut_jll 90.00% compilation time
    299.3 ms  GeometryBasics
      0.3 ms  Requires
      0.2 ms  SuiteSparse
      2.4 ms  ArrayInterfaceCore
      0.1 ms  IfElse
     39.6 ms  Static
      0.2 ms  Compat
     14.6 ms  ArrayInterface
     40.0 ms  OffsetArrays
      0.6 ms  ArrayInterfaceOffsetArrays
      0.3 ms  ArrayInterfaceStaticArraysCore
      0.9 ms  ArrayInterfaceStaticArrays
      0.3 ms  SIMDTypes
      1.4 ms  ManualMemory
      3.3 ms  LayoutPointers
      0.6 ms  CpuId
     97.3 ms  CPUSummary 83.91% compilation time
      0.2 ms  BitTwiddlingConvenienceFunctions
     10.2 ms  HostCPUFeatures
    368.1 ms  VectorizationBase
      2.9 ms  SLEEFPirates
      0.1 ms  UnPack
      6.8 ms  ThreadingUtilities
      4.2 ms  PolyesterWeave
      0.9 ms  DocStringExtensions
      3.1 ms  CloseOpenIntervals
      0.1 ms  SnoopPrecompile
     71.3 ms  ChainRulesCore
      4.3 ms  IrrationalConstants
      0.5 ms  DiffRules
      2.0 ms  DiffResults
      0.1 ms  OpenLibm_jll
      0.2 ms  NaNMath
      0.6 ms  ChangesOfVariables
      0.7 ms  InverseFunctions
      0.5 ms  LogExpFunctions
      0.3 ms  OpenSpecFun_jll
     26.6 ms  SpecialFunctions
     14.3 ms  MacroTools
      0.2 ms  CommonSubexpressions
    100.1 ms  ForwardDiff
      2.8 ms  SIMDDualNumbers
    208.7 ms  LoopVectorization
   1378.0 ms  Octavian 67.67% compilation time (77% recompilation)

It’s dependency StaticArrays.jl was causing quite a lot of invalidations related but these were fixed a few days ago in use invokelatest to prevent invalidations in TOML by KristofferC · Pull Request #48083 · JuliaLang/julia · GitHub, which is already contained in the backports branch Backports for julia 1.9.0-beta3 by KristofferC · Pull Request #48075 · JuliaLang/julia · GitHub that I used for this post.

Testing all dependencies of GeometryBasics.jl alone didn’t show the huge amount of recompilation for Octavian.jl, so I assume invalidations from GeometryBasics.jl itself are the reason (coming from the implementation of OffsetIntegers). A first investigation wasn’t successful since ascend lead me only to abstractly inferred code that didn’t tell me at first sight why it is causing these problems.

One probably obvious thing worth pointing out is that the best way to speed up code loading is to load less code. 1.9 has added the ability for packages to have conditional dependencies which can dramatically reduce the number of dependencies loaded. Before working to reduce the time GeometryBasics.jl takes, it is worth asking if you actually need that code at all or if it can be turned into a conditional dependency and not loaded.

1 Like

Turning it into a conditional dependency is definitely we will consider when Julia 1.9 is out (following 5. Creating Packages · Pkg.jl). For now, we’re using Requires.jl (and I made a PR to move all code using GeometryBasics.jl to a requires block).
Nevertheless, I am still interested whether there are better approaches than the “manual” approach I have described above.