Julia v1.9.0-beta2 is fast

One doubt/curiosity: if one loads a package in a fresh Julia section, and then later opens Julia again in the same environment, should now the loading time be very fast? I mean, is there any reason for recompilation to occur among two subsequent loadings of a package, if nothing has been done in the mean time?

Invalidations happen when you load the package, not when you run the workload. You’re snooping the wrong step.

1 Like

If the package needs precompilation, then yes, using MyPkg will be very slow the first time. But loading is still slow (less than in the first case) once it’s been precompiled. The second form of slowness has nothing to do with I/O (that’s fast), instead almost all of it is about checking whether the loaded code is still valid.

For those who curse invalidations, keep in mind they are a feature, not a bug, even though they cause headaches. Julia has three remarkable properties:

  • it’s aggressively compiled & automatically specialized
  • it’s interactive
  • we attempt to guarantee that the code that runs is the same no matter how you got there: no matter whether you loaded everything in your “universe of code” before triggering compilation, or whether you built up your system incrementally at the REPL with usage, you get the same outcome.

The last property is critical and is what makes the REPL actually work. Otherwise you’d get different outcomes when you used the REPL vs something “hardbaked” in a package. But it’s also what requires us to check for invalidations, and what makes package loading slow.

12 Likes

Thanks. After this season is over (I’ll be traveling after lunch) I will open an issue trying to explain all that I’ve tried and results. You should now be able to automatically install GMT (unless you are also bitten by the libnetcdf/libcurl fight.

1 Like

I just share people this script, which IMO should be in the docs in some form.

using SnoopCompile
invalidations = @snoopr begin
    # Your code goes here!
    using OrdinaryDiffEq
    function lorenz(du, u, p, t)
        du[1] = 10.0(u[2] - u[1])
        du[2] = u[1] * (28.0 - u[3]) - u[2]
        du[3] = u[1] * u[2] - (8 / 3) * u[3]
    end
    u0 = [1.0; 0.0; 0.0]
    tspan = (0.0, 100.0)
    prob = ODEProblem{true,false}(lorenz, u0, tspan)
    alg = Rodas5()
    tinf = solve(prob, alg)
end;
trees = SnoopCompile.invalidation_trees(invalidations);
@show length(SnoopCompile.uinvalidated(invalidations)) # show total invalidations
show(trees[end]) # show the most invalidated method
# Count number of children (number of invalidations per invalidated method)
n_invalidations = map(trees) do methinvs
    SnoopCompile.countchildren(methinvs)
end
import Plots
Plots.plot(
    1:length(trees),
    n_invalidations;
    markershape=:circle,
    xlabel="i-th method invalidation",
    label="Number of children per method invalidations"
)

Just replace the OrdinaryDiffEq code with the code you want analyzed and this will spit out an informative plot of the invalidations distribution plus what the biggest invalidating method is.

14 Likes

3 posts were split to a new topic: Memory usage in Julia v1.9.0-beta2

The script errors when there are no invalidations

julia> include("using_SC.jl")
length(SnoopCompile.uinvalidated(invalidations)) = 0
ERROR: LoadError: BoundsError: attempt to access 0-element Vector{SnoopCompile.MethodInvalidations} at index [0]

Use Julia v1.9.0-beta2 is fast - #17 by tim.holy instead. It shouldn’t error and it will also make your TODO list much, much smaller :stuck_out_tongue: .

2 Likes

Getting an error when trying this.

Using an environment with julia-1.9.0-beta2 and only the three relevant packages installed.

julia> using SnoopCompileCore

julia> invs = @snoopr using ControlSystems;

julia> using SnoopCompile

julia> trees = invalidation_trees(invs)
ERROR: KeyError: key MethodInstance for LoopVectorization.avx_config_val(::Val{(false, 0, 0, 0, false, 0x0000000000000001, 1, true)}, ::Static.StaticInt{4}) not found
Stacktrace:
 [1] getindex(h::Dict{Union{Int32, Core.MethodInstance}, Union{Tuple{Any, Vector{Any}}, SnoopCompile.InstanceNode}}, key::Core.MethodInstance)
   @ Base ./dict.jl:484
 [2] invalidation_trees(list::Vector{Any}; exclude_corecompiler::Bool)
   @ SnoopCompile ~/.julia/packages/SnoopCompile/Q8zUg/src/invalidations.jl:403
 [3] invalidation_trees(list::Vector{Any})
   @ SnoopCompile ~/.julia/packages/SnoopCompile/Q8zUg/src/invalidations.jl:294
 [4] top-level scope
   @ REPL[4]:1

(testinv) pkg> st
Status `/tmp/testinv/Project.toml`
  [a6e380b2] ControlSystems v1.5.3
  [aa65fe97] SnoopCompile v2.9.5
  [e2b509da] SnoopCompileCore v2.9.0

Sorry about that. I don’t yet understand the error, but in the meantime there’s a hack-workaround real fix available if you do pkg> add SnoopCompile#teh/circumvent_err use SnoopCompile 2.9.6 or higher. That should at least allow you to keep moving forward, with the understanding that there are no guarantees that the output is comprehensive (and if you see stuff that makes no sense, it would be worth suspecting a bug).

I’ll try to develop a clearer understanding in the next few days, but I wanted you to have something to play with. :heavy_check_mark:

11 Likes

The error has been fixed (correctly). You can get the fix with SnoopCompile 2.9.6 or higher. In case anybody is using the teh/circumvent_err branch, I’ll try to remember to leave it lying around for a week or so, but it’s better to upgrade to the latest release.

8 Likes

Thought I’d leave this here as well: I wanted to get a feel for how the user experience of the “original” TTFX - i.e. time-to-first-plot using Plots.jl - has evolved over the years, so I wrote a little benchmark script that times starting Julia and running using Plots; display(scatter(rand(1_000))) for all Julia versions since 0.7.

Here’s what the mean and minimum times look like across versions:

image

Initially I had included 0.6.4 as well but with such an old Julia version one has to also fall back onto a very old Plots version which errors out on the display call. An alternative script with savefig instead of display suggests that 0.6 had similar TTFP to 0.7-1.2.

I think the plot makes clear the scale of achievement here - while I’m sure there’s more to come as the ecosystem starts to make best use of the new abilities of Julia, it seems clear that this is the first time that we’ve seen a dramatic qualitative shift in the sort of very basic experience that the modal user might have without resorting to any tricks like PackageCompiler.

Some more boring details for those interested below:

Details

The benchmarks were run on an i7 Windows laptop, versioninfo() is:

Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, tigerlake)
  Threads: 1 on 8 virtual cores

As Plots.jl has dropped compatibility for older Julia versions over time, going back in Julia versions also means going back to older Plots versions, so the benchmarks don’t just capture differences in precompilation and other language features, but changes in Plots itself.

The following table shows the Plots version used for every Julia version, as well as the average, minimum, and standard deviation of the 20 samples taken:

     │ String   String         Float64  Float64  Float64 
─────┼───────────────────────────────────────────────────
   1 │ 0.6.4    0.17.4            11.5     10.7      0.9
   2 │ 0.7.0    0.19.3            21.5     19.8      1.3
   3 │ 1.0.5    1.4.3             18.5     18.1      0.5
   4 │ 1.1.1    1.4.3             21.3     18.3      4.4
   5 │ 1.2.0    1.4.4             22.8     21.4      1.5
   6 │ 1.3.1    1.6.12            27.3     26.2      0.7
   7 │ 1.4.2    1.6.12            35.0     30.0      5.4
   8 │ 1.5.4    1.24.3            35.7     31.9      3.3
   9 │ 1.6.5    1.27.6            24.6     21.7      2.4
  10 │ 1.7.3    1.37.2            24.3     22.6      2.0
  11 │ 1.8.3    1.38.0            20.6     17.7      2.8
  12 │ 1.9.0b   1.38.0-dev         7.6      4.1      1.1

Note that 1.9.0b uses a branch of Plots with a fix from Kristoffer that addresses an issue with the usage of cached code on Windows.

The whole benchmark code run is:

using DataFrames, ProgressMeter

path = "path/to/Julias"

julias = Dict(
    "0.6.4" => "Julia-0.6.4/bin/julia.exe",
    "0.7.0" => "Julia-0.7.0/bin/julia.exe",
    "1.0.5" => "Julia-1.0.5/bin/julia.exe",
    "1.1.1" => "Julia-1.1.1/bin/julia.exe",
    "1.2.0" => "Programs/Julia-1.2.0/bin/julia.exe",
    "1.3.1" => "Julia-1.3.1/bin/julia.exe",
    "1.4.2" => "Programs/Julia/Julia-1.4.2/bin/julia.exe",
    "1.5.4" => "Programs/Julia 1.5.4/bin/julia.exe",    
    "1.6.5" => "Programs/Julia-1.6.5/bin/julia.exe",
    "1.7.3" => "Programs/Julia-1.7.3/bin/julia.exe",
    "1.8.3" => "Programs/Julia-1.8.3/bin/julia.exe",
    "1.9.0b" => "Programs/Julia-1.9.0-beta2/bin/julia.exe",
)

jdf = sort(DataFrame(version = collect(keys(julias)), location = collect(values(julias))), :version)

# Check versions
for r ∈ eachrow(jdf)
    println("Julia version: ", r.version)
    if r.version == "0.6.4"
        run(`$(normpath(path, r.location)) -e "println(Pkg.installed()[\"Plots\"])"`);    
    else
        run(`$(normpath(path, r.location)) -e "import Pkg; println(Pkg.installed()[\"Plots\"])"`);
    end
    println()
end

n = 20

jdf.times = [zeros(n) for _ ∈ 1:nrow(jdf)]

# Run code
for r ∈ eachrow(jdf)
    println("Julia version: ", r.version)
    pn = normpath(path, r.location)
    @showprogress for i ∈ 1:n
        if r.version == "0.6.4"
            r.times[i] = @elapsed run(`$pn -e "using Plots; scatter(rand(1_000))"`);    
        else
            r.times[i] = @elapsed run(`$pn -e "using Plots; display(scatter(rand(1_000)))"`);    
        end
    end
    println()
end

using Plots, Statistics

jdf.average = mean.(jdf.times)
jdf.standard_dev = std.(jdf.times)

bar(jdf.version[2:end], jdf.average[2:end], label = "",
    linewidth = 0.1, ylabel = "Time (seconds)", xlabel = "Julia version",
    yticks = 0:5:50, 
    title = "Mean/minimum time for 20 samples\nusing Plots; display(scatter(rand(1_000)))")
scatter!((1:nrow(jdf)-1) .- 0.5, minimum.(jdf.times)[2:end], label = "Minimum", markerstrokewidth = 0.1)

jdf.Plots_version = ["0.17.4", "0.19.3", "1.4.3", "1.4.3", "1.4.4", "1.6.12", 
    "1.6.12", "1.24.3", "1.27.6", "1.37.2", "1.38.0", "1.38.0-dev"]

select(jdf, :version, :average => (x -> round.(x, digits = 1)) => :average,
    :times => (x -> round.(minimum.(x), digits = 1)) => :minimum, 
    :times => (x -> round.(std.(x), digits = 1)) => :std_dev)
84 Likes

That’s really cool! If you have the time and willingness, would you be able to check DifferentialEquations?

Here’s a one-liner that I believe should work even on ancient versions:

using DifferentialEquations;  f(u,p,t) = 1.01*u; solve(ODEProblem(f,1/2,(0.0,1.0)), Tsit5(), abstol=1e-8)
4 Likes

Just wanted to say thanks to everybody who worked on this. On DFTK, with a simple SnoopPrecompile.@precompile_all_calls before a standard workflow in the package, TTFX went from 40s to 8s on my machine, and from 80s to 2s on somebody else’s (not sure why the discrepancy…), which is pretty amazing. Precompilation costs did go through the roof, which is unfortunate, but we added an environment variable so precompilation can be disabled for developers.

18 Likes

Well that was quite the Odyssey… Best to do these kinds of experiments with packages that take less than 10 minutes to install and precompile (I think on 0.6.4 it took 20 minutes just to download all the dependencies). I had also forgotten how annoying precompilation was before parallel precompilation and progress bars - just staring at a terminal with nothing happening for minutes on end isn’t fun.

And after all that the results are… not excatly overwhelming :person_shrugging:

image

Note that I couldn’t get the package to precompule on 1.4 and 1.5, so I skipped those Julia versions.

Here we are covering 4 major DiffEq releases (from 4.5 to 7.6), so I assume we are comparing across some pretty major changes to the package itself, and during installation I noticed that later versions pull in a lot of heavy dependencies like LoopVectorization. On 1.9-beta2 it looks like virtually all the time is spent in using DifferentialEquations, so maybe there’s not a lot to be gained on the actual first solve - but I don’t really know anything about DifferentialEquations so I’m not the right person to make sense of these results!

Here are the package versions used and summary statistics of the 20 samples for each Julia version:

 Row │ version  DiffEq_version  average  minimum  std_dev 
     │ String   String          Float64  Float64  Float64 
─────┼────────────────────────────────────────────────────
   1 │ 0.6.4    4.5.0               7.1      6.8      0.4
   2 │ 0.7.0    5.2.1              14.3     11.1      1.8
   3 │ 1.0.5    6.10.1             25.9     20.5      2.4
   4 │ 1.1.1    6.10.1             31.2     29.1      2.0
   5 │ 1.2.0    6.11.0             32.1     24.5      4.7
   6 │ 1.3.1    6.18.0             32.3     26.7      4.7
   7 │ 1.6.5    7.6.0              12.8     11.4      1.4
   8 │ 1.7.3    7.6.0              10.4      9.5      0.7
   9 │ 1.8.3    7.6.0              17.7     17.3      0.4
  10 │ 1.9.0b   7.6.0              10.7      9.9      0.8
Full code
using DataFrames, ProgressMeter

path = "path/to/julias"

julias = Dict(
    "0.6.4" => "Julia-0.6.4/bin/julia.exe",
    "0.7.0" => "Julia-0.7.0/bin/julia.exe",
    "1.0.5" => "Julia-1.0.5/bin/julia.exe",
    "1.1.1" => "Julia-1.1.1/bin/julia.exe",
    "1.2.0" => "Programs/Julia-1.2.0/bin/julia.exe",
    "1.3.1" => "Julia-1.3.1/bin/julia.exe",
    #"1.4.2" => "Programs/Julia/Julia-1.4.2/bin/julia.exe",
    #"1.5.4" => "Programs/Julia 1.5.4/bin/julia.exe",    
    "1.6.5" => "Programs/Julia-1.6.5/bin/julia.exe",
    "1.7.3" => "Programs/Julia-1.7.3/bin/julia.exe",
    "1.8.3" => "Programs/Julia-1.8.3/bin/julia.exe",
    "1.9.0b" => "Programs/Julia-1.9.0-beta2/bin/julia.exe",
)

jdf = sort(DataFrame(version = collect(keys(julias)), location = collect(values(julias))), :version)

# Check versions
for r ∈ eachrow(jdf)
    println("Julia version: ", r.version)
    if r.version == "0.6.4"
        run(`$(normpath(path, r.location)) -e "println(Pkg.installed()[\"DifferentialEquations\"])"`);    
    else
        run(`$(normpath(path, r.location)) -e "import Pkg; println(Pkg.installed()[\"DifferentialEquations\"])"`);
    end
    println()
end

n = 20

jdf.times = [zeros(n) for _ ∈ 1:nrow(jdf)]

# Run code
for r ∈ eachrow(jdf)
    println("Julia version: ", r.version)
    pn = normpath(path, r.location)
    @showprogress for i ∈ 1:n
        r.times[i] = @elapsed run(`$pn -e "using DifferentialEquations;  f(u,p,t) = 1.01*u; solve(ODEProblem(f,1/2,(0.0,1.0)), Tsit5(), abstol=1e-8)"`);    
    end
    println()
end

using Plots, Statistics

jdf.average = mean.(jdf.times)
jdf.standard_dev = std.(jdf.times)

bar(jdf.version, jdf.average, label = "",
    linewidth = 0.1, ylabel = "Time (seconds)", xlabel = "Julia version",
    yticks = 0:5:50, 
    title = "Mean/minimum time for 20 samples\nTime-to-first-DiffEq solve")
scatter!((1:nrow(jdf)) .- 0.5, minimum.(jdf.times), label = "Minimum", markerstrokewidth = 0.1)

jdf.DiffEq_version = ["4.5.0", "5.2.1", "6.10.1", "6.10.1", "6.11.0", "6.18.0", 
    "7.6.0", "7.6.0", "7.6.0", "7.6.0"]

print(select(jdf, :version, :DiffEq_version, :average => (x -> round.(x, digits = 1)) => :average,
    :times => (x -> round.(minimum.(x), digits = 1)) => :minimum, 
    :times => (x -> round.(std.(x), digits = 1)) => :std_dev))
21 Likes

We should also keep in mind that the amazing new pkgimg caching of native code solves only “time to first X after things are already imported when invalidations are fixed by package developers”. Developers separately are solving the invalidation issues (which had started to pay off even before pkgimages). However, there is still a lot of work that can be done on the “import time” side of things. The plots above show “import time + time to first X after import” which can be confusing because:

  • pkgimages (and fixing invalidations) turn “time to first X after import” from many seconds to milliseconds
  • pkgimages do not change “import-only time” much

Nonetheless, this is an amazing step forward, with many other exciting ideas being rumored on offhand comments on github.

2 Likes

This is a very cool graph. I think this subject should have its own blog post in addition to being mentioned in the highlights post and plots like this would be excellent to show the trajectory over time. I can add some interpretation here:

  • 0.7 versus 1.0: The 1.0 release deleted a lot of deprecation code, making it significantly leaner and faster than 0.7.
  • 1.0 through 1.5: The compiler got smarter and generated faster code, but at the expense of worsening compile times. Some compensatory latency reductions were done in this era, but they were outpaced by improved compiler output taking more time (and LLVM getting slower).
  • 1.6 through 1.8: Incremental improvements to compile time due to focus on reducing latency in both Julia and LLVM. We were always acutely aware of this, and people started shaming LLVM for its compile times; LLVM added compile time benchmarks and now tracks them and has been improving somewhat.
  • 1.9: Machine code caching is transformative at significantly reducing TTFP. Now that it works, more can be done here: packages can precompile more now that it’s more worthwhile to do so; fixing invalidations now has larger proportional effect; more effort can be made on reducing the remaining bottlenecks in load time.

Notable is that second two phases retain and even build on the runtime performance improvements gained in the previous era so it’s a win-win: much improved load latency and much improved runtime performance.

37 Likes

Oh, and we might see big improvements in load / compile time from the newly landed “weak dependencies” functionality, which can reduce the number of dependencies that packages have considerably by allowing them to convert some regular deps to weak deps so that you don’t have to pay all the time on the off chance that you might use them. That requires more of an ecosystem wide movement though.

20 Likes

I’d also add all the method invalidation hunting that happened during this period, happening in a concerted manner across the entire ecosystem, with new package versions that are dependent upon work in Julia itself. The best compile time improvements are compilations that don’t need to happen!

4 Likes

I think that’s quite interesting, thanks! (I should have told you to use only OrdinaryDiffEq.jl, that would have had way less dependencies, sorry about that!)

1 Like