Any way to speed up loading large precompiled packages?

For end user applications, I usually provide a script that performs the package compiler step as part of “installation”.

I did some analysis of the loading time for these two packages for you.

julia> @time @time_imports using OrdinaryDiffEq, Symbolics
      1.5 ms  DocStringExtensions
      0.3 ms  Reexport
      0.1 ms  SuiteSparse
      0.2 ms  Requires
      2.6 ms  ArrayInterface
      3.5 ms  StaticArraysCore
      0.4 ms  ArrayInterface → ArrayInterfaceStaticArraysCoreExt
     17.9 ms  FunctionWrappers
      0.4 ms  MuladdMacro
      9.4 ms  OrderedCollections
      0.3 ms  UnPack
      0.4 ms  Parameters
      0.9 ms  Statistics
      0.1 ms  IfElse
     33.1 ms  Static
      0.3 ms  Compat
      0.2 ms  Compat → CompatLinearAlgebraExt
     16.8 ms  Preferences
      0.3 ms  SnoopPrecompile
      6.8 ms  StaticArrayInterface
      1.4 ms  ManualMemory
     49.0 ms  ThreadingUtilities
      0.3 ms  SIMDTypes
      5.1 ms  LayoutPointers
      5.1 ms  CloseOpenIntervals
    441.1 ms  StrideArraysCore
      0.3 ms  BitTwiddlingConvenienceFunctions
      0.9 ms  CpuId
    153.6 ms  CPUSummary 95.26% compilation time
      7.2 ms  PolyesterWeave
      0.8 ms  Polyester
      0.4 ms  FastBroadcast
     34.9 ms  ChainRulesCore
      0.3 ms  PrecompileTools
     46.6 ms  RecipesBase
      0.8 ms  SymbolicIndexingInterface
      0.3 ms  Adapt
      0.1 ms  DataValueInterfaces
      1.0 ms  DataAPI
      0.2 ms  IteratorInterfaceExtensions
      0.2 ms  TableTraits
     41.5 ms  Tables
      2.3 ms  GPUArraysCore
      0.2 ms  ArrayInterface → ArrayInterfaceGPUArraysCoreExt
     18.5 ms  RecursiveArrayTools
     12.2 ms  MacroTools
      0.3 ms  TruncatedStacktraces
      0.8 ms  ZygoteRules
      1.2 ms  ConstructionBase
     21.0 ms  Setfield
      7.2 ms  IrrationalConstants
      1.1 ms  DiffRules
      4.1 ms  DiffResults
      0.2 ms  OpenLibm_jll
      0.3 ms  NaNMath
      0.4 ms  LogExpFunctions
      0.5 ms  LogExpFunctions → LogExpFunctionsChainRulesCoreExt
      0.3 ms  JLLWrappers
      5.2 ms  OpenSpecFun_jll 87.29% compilation time
     14.4 ms  SpecialFunctions
      1.0 ms  SpecialFunctions → SpecialFunctionsChainRulesCoreExt
      0.3 ms  CommonSubexpressions
     82.1 ms  ForwardDiff
      0.5 ms  EnumX
      1.2 ms  PreallocationTools
      0.5 ms  FunctionWrappersWrappers
      0.2 ms  CommonSolve
      0.3 ms  ExprTools
      1.0 ms  RuntimeGeneratedFunctions
      0.3 ms  Tricks
     12.9 ms  Lazy
     18.6 ms  SciMLOperators
    190.4 ms  SciMLBase
      8.6 ms  DiffEqBase
      0.3 ms  FastClosures
      3.2 ms  ArrayInterfaceCore
     39.0 ms  HostCPUFeatures
    280.0 ms  VectorizationBase
      4.8 ms  SLEEFPirates
     61.1 ms  OffsetArrays
      1.3 ms  StaticArrayInterface → StaticArrayInterfaceOffsetArraysExt
    224.1 ms  LoopVectorization
      0.3 ms  LoopVectorization → SpecialFunctionsExt
      5.3 ms  LoopVectorization → ForwardDiffExt
      1.4 ms  TriangularSolve
    235.5 ms  RecursiveFactorization
     15.2 ms  IterativeSolvers
     34.5 ms  KLU
      3.7 ms  Sparspak
      6.7 ms  FastLapackInterface
     22.4 ms  Krylov
     52.5 ms  KrylovKit
    270.0 ms  LinearSolve
    696.5 ms  StaticArrays
      5.7 ms  StaticArrayInterface → StaticArrayInterfaceStaticArraysExt
      0.2 ms  Adapt → AdaptStaticArraysExt
      2.4 ms  ConstructionBase → ConstructionBaseStaticArraysExt
      0.7 ms  ForwardDiff → ForwardDiffStaticArraysExt
      5.8 ms  FiniteDiff
    104.8 ms  SimpleNonlinearSolve
      8.2 ms  NLSolversBase
      7.3 ms  LineSearches
      0.2 ms  SimpleUnPack
     93.8 ms  DataStructures
     38.0 ms  GenericSchur
   1544.0 ms  ExponentialUtilities
      3.1 ms  SimpleTraits
      4.4 ms  ArnoldiMethod
      1.0 ms  Inflate
     41.2 ms  Graphs
      0.6 ms  VertexSafeGraphs
      6.1 ms  SparseDiffTools
    255.3 ms  NonlinearSolve
      0.5 ms  StatsAPI
      5.8 ms  Distances
      2.9 ms  NLsolve
      0.9 ms  SciMLNLSolve
   1755.4 ms  OrdinaryDiffEq
     83.9 ms  IntervalSets
      0.8 ms  ConstructionBase → ConstructionBaseIntervalSetsExt
      2.0 ms  CompositeTypes
    224.1 ms  DomainSets
      0.5 ms  Unityper
     39.1 ms  AbstractTrees
     46.9 ms  TimerOutputs
      5.4 ms  Combinatorics
    565.9 ms  MutableArithmetics
    127.1 ms  MultivariatePolynomials
     35.1 ms  DynamicPolynomials
      2.0 ms  Bijections
     62.7 ms  LabelledArrays
    423.9 ms  SymbolicUtils
      0.4 ms  TreeViews
     41.3 ms  RandomExtensions
     12.7 ms  GroupsCore
    203.0 ms  AbstractAlgebra
      0.4 ms  IntegerMathUtils
     23.0 ms  Primes
    849.3 ms  Groebner
      0.7 ms  SortingAlgorithms
     20.8 ms  Missings
     31.8 ms  StatsBase
     48.6 ms  PDMats
    189.7 ms  Rmath_jll 99.59% compilation time (100% recompilation)
      1.1 ms  Rmath
      1.9 ms  Calculus
     33.3 ms  DualNumbers
      2.1 ms  HypergeometricFunctions
      5.5 ms  StatsFuns
      0.4 ms  StatsFuns → StatsFunsChainRulesCoreExt
      7.5 ms  QuadGK
    229.2 ms  FillArrays
    747.8 ms  Distributions
      1.2 ms  Distributions → DistributionsChainRulesCoreExt
      0.3 ms  DiffEqBase → DiffEqBaseDistributionsExt
      0.8 ms  LaTeXStrings
      1.1 ms  Formatting
     90.7 ms  Latexify
     10.1 ms  LambertW
    480.4 ms  Symbolics
 12.270529 seconds (12.59 M allocations: 775.836 MiB, 5.17% gc time, 3.68% compilation time: 42% of which was recompilation)

Here is the sorted list of the top 10 packages that take more than 300 ms to load.

   1789.4 ms  OrdinaryDiffEq
   1578.9 ms  ExponentialUtilities
    872.6 ms  Groebner
    756.7 ms  Distributions
    687.8 ms  StaticArrays
    563.7 ms  MutableArithmetics
    485.4 ms  Symbolics
    427.1 ms  SymbolicUtils
    424.3 ms  StrideArraysCore
    312.6 ms  VectorizationBase

These 10 packages account for 7.1 seconds out of the 12.2 seconds (58%) of the loading time.

You might want to take a look under your ~/.julia/compiled/v1.9 directory to see how large the shared libraries are for the respective packages (.so, .dll, .dylib). Scanning my .julia/compiled/v1.9 directory for those ten packages, I get the following.

$ du -hcs OrdinaryDiffEq ExponentialUtilities Groebner Distributions StaticArrays MutableArithmetics Symbolics SymbolicUtils StrideArraysCore VectorizationBase
151M	OrdinaryDiffEq
16M	ExponentialUtilities
11M	Groebner
16M	Distributions
75M	StaticArrays
5.7M	MutableArithmetics
14M	Symbolics
15M	SymbolicUtils
600K	StrideArraysCore
19M	VectorizationBase
320M	total

As for your other idea of modifying the precompilation of your dependencies, you could also fork those packages, modify their top-level precompilation statements, and then provide your collaborators a Manifest.toml pointing at your forks.

5 Likes