Much worse import time for `Statistics` and `SparseArrays` on julia 1.9.0 nightly?

I am confused by this. Why would Statistics (or its dependency SparseArrays), a package in the standard library, have such a bad time to import? Anything I can do about this in my package that depends on Statistics? I have been trying to lower its TTFX and this has become a bottleneck. Other standard libraries do not seem to have these issues?

Version 1.9.0-DEV.874 (2022-06-30)
julia> @time_imports using Statistics
   2242.3 ms  ┌ SparseArrays 5.37% compilation time
   2265.5 ms  Statistics 6.25% compilation time

Version 1.7.0 (2021-11-30)
julia> @time @eval using Statistics
  0.002965 seconds (1.20 k allocations: 85.719 KiB, 92.20% compilation time)

Version 1.8.0-beta3 (2022-03-29)
julia> @time @eval using Statistics
  0.002675 seconds (1.16 k allocations: 81.145 KiB, 86.59% compilation time)

See here: Move out SparseArrays and SuiteSparse from the sysimage by KristofferC · Pull Request #44247 · JuliaLang/julia · GitHub for more discussion.

This is due to the removal of SparseArrays, Statistics and a few more stdlibs from the sysimage for 1.9. The increase in import time is expected, but understandably frustrating.

6 Likes

Sorry for re-animating.

I’ve read the linked PR and the links inside, but still wonder: whence the drive to move packages out of stdlib? Just for a Julia install of less MB?

One of the most cited reasons is that it allows for faster updating: when they are stdlibs they can only release an update whenever julia publishes a release. More points are explained here, where Viral B. Shah said:

This is being planned not just DelimitedFiles, but also Statistics and the sparse ecosystem. In my opinion, there are several benefits:

  1. Allow broader participation in these packages from contributors
  2. Faster bugfixing and upgrades without having to wait for whole Julia release cycles
  3. Make them less special and allow for alternatives to evolve, or become more flexible (e.g. we want to support many new sparse data types and solvers in a first class way)
  4. Consolidate capabilities in certain ecosystems (e.g. basic stats is spread out across too many packages like Statistics.jl, StatsBase.jl etc., and it is complex for new users to navigate - not to mention it is difficult to maintain)
5 Likes