Julia v1.9.0-beta2 is fast

Thanks. After this season is over (I’ll be traveling after lunch) I will open an issue trying to explain all that I’ve tried and results. You should now be able to automatically install GMT (unless you are also bitten by the libnetcdf/libcurl fight.

1 Like

I just share people this script, which IMO should be in the docs in some form.

using SnoopCompile
invalidations = @snoopr begin
    # Your code goes here!
    using OrdinaryDiffEq
    function lorenz(du, u, p, t)
        du[1] = 10.0(u[2] - u[1])
        du[2] = u[1] * (28.0 - u[3]) - u[2]
        du[3] = u[1] * u[2] - (8 / 3) * u[3]
    end
    u0 = [1.0; 0.0; 0.0]
    tspan = (0.0, 100.0)
    prob = ODEProblem{true,false}(lorenz, u0, tspan)
    alg = Rodas5()
    tinf = solve(prob, alg)
end;
trees = SnoopCompile.invalidation_trees(invalidations);
@show length(SnoopCompile.uinvalidated(invalidations)) # show total invalidations
show(trees[end]) # show the most invalidated method
# Count number of children (number of invalidations per invalidated method)
n_invalidations = map(trees) do methinvs
    SnoopCompile.countchildren(methinvs)
end
import Plots
Plots.plot(
    1:length(trees),
    n_invalidations;
    markershape=:circle,
    xlabel="i-th method invalidation",
    label="Number of children per method invalidations"
)

Just replace the OrdinaryDiffEq code with the code you want analyzed and this will spit out an informative plot of the invalidations distribution plus what the biggest invalidating method is.

14 Likes

3 posts were split to a new topic: Memory usage in Julia v1.9.0-beta2

The script errors when there are no invalidations

julia> include("using_SC.jl")
length(SnoopCompile.uinvalidated(invalidations)) = 0
ERROR: LoadError: BoundsError: attempt to access 0-element Vector{SnoopCompile.MethodInvalidations} at index [0]

Use Julia v1.9.0-beta2 is fast - #17 by tim.holy instead. It shouldn’t error and it will also make your TODO list much, much smaller :stuck_out_tongue: .

2 Likes

Getting an error when trying this.

Using an environment with julia-1.9.0-beta2 and only the three relevant packages installed.

julia> using SnoopCompileCore

julia> invs = @snoopr using ControlSystems;

julia> using SnoopCompile

julia> trees = invalidation_trees(invs)
ERROR: KeyError: key MethodInstance for LoopVectorization.avx_config_val(::Val{(false, 0, 0, 0, false, 0x0000000000000001, 1, true)}, ::Static.StaticInt{4}) not found
Stacktrace:
 [1] getindex(h::Dict{Union{Int32, Core.MethodInstance}, Union{Tuple{Any, Vector{Any}}, SnoopCompile.InstanceNode}}, key::Core.MethodInstance)
   @ Base ./dict.jl:484
 [2] invalidation_trees(list::Vector{Any}; exclude_corecompiler::Bool)
   @ SnoopCompile ~/.julia/packages/SnoopCompile/Q8zUg/src/invalidations.jl:403
 [3] invalidation_trees(list::Vector{Any})
   @ SnoopCompile ~/.julia/packages/SnoopCompile/Q8zUg/src/invalidations.jl:294
 [4] top-level scope
   @ REPL[4]:1

(testinv) pkg> st
Status `/tmp/testinv/Project.toml`
  [a6e380b2] ControlSystems v1.5.3
  [aa65fe97] SnoopCompile v2.9.5
  [e2b509da] SnoopCompileCore v2.9.0

Sorry about that. I don’t yet understand the error, but in the meantime there’s a hack-workaround real fix available if you do pkg> add SnoopCompile#teh/circumvent_err use SnoopCompile 2.9.6 or higher. That should at least allow you to keep moving forward, with the understanding that there are no guarantees that the output is comprehensive (and if you see stuff that makes no sense, it would be worth suspecting a bug).

I’ll try to develop a clearer understanding in the next few days, but I wanted you to have something to play with. :heavy_check_mark:

11 Likes

The error has been fixed (correctly). You can get the fix with SnoopCompile 2.9.6 or higher. In case anybody is using the teh/circumvent_err branch, I’ll try to remember to leave it lying around for a week or so, but it’s better to upgrade to the latest release.

8 Likes

Thought I’d leave this here as well: I wanted to get a feel for how the user experience of the β€œoriginal” TTFX - i.e. time-to-first-plot using Plots.jl - has evolved over the years, so I wrote a little benchmark script that times starting Julia and running using Plots; display(scatter(rand(1_000))) for all Julia versions since 0.7.

Here’s what the mean and minimum times look like across versions:

image

Initially I had included 0.6.4 as well but with such an old Julia version one has to also fall back onto a very old Plots version which errors out on the display call. An alternative script with savefig instead of display suggests that 0.6 had similar TTFP to 0.7-1.2.

I think the plot makes clear the scale of achievement here - while I’m sure there’s more to come as the ecosystem starts to make best use of the new abilities of Julia, it seems clear that this is the first time that we’ve seen a dramatic qualitative shift in the sort of very basic experience that the modal user might have without resorting to any tricks like PackageCompiler.

Some more boring details for those interested below:

Details

The benchmarks were run on an i7 Windows laptop, versioninfo() is:

Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 Γ— 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, tigerlake)
  Threads: 1 on 8 virtual cores

As Plots.jl has dropped compatibility for older Julia versions over time, going back in Julia versions also means going back to older Plots versions, so the benchmarks don’t just capture differences in precompilation and other language features, but changes in Plots itself.

The following table shows the Plots version used for every Julia version, as well as the average, minimum, and standard deviation of the 20 samples taken:

     β”‚ String   String         Float64  Float64  Float64 
─────┼───────────────────────────────────────────────────
   1 β”‚ 0.6.4    0.17.4            11.5     10.7      0.9
   2 β”‚ 0.7.0    0.19.3            21.5     19.8      1.3
   3 β”‚ 1.0.5    1.4.3             18.5     18.1      0.5
   4 β”‚ 1.1.1    1.4.3             21.3     18.3      4.4
   5 β”‚ 1.2.0    1.4.4             22.8     21.4      1.5
   6 β”‚ 1.3.1    1.6.12            27.3     26.2      0.7
   7 β”‚ 1.4.2    1.6.12            35.0     30.0      5.4
   8 β”‚ 1.5.4    1.24.3            35.7     31.9      3.3
   9 β”‚ 1.6.5    1.27.6            24.6     21.7      2.4
  10 β”‚ 1.7.3    1.37.2            24.3     22.6      2.0
  11 β”‚ 1.8.3    1.38.0            20.6     17.7      2.8
  12 β”‚ 1.9.0b   1.38.0-dev         7.6      4.1      1.1

Note that 1.9.0b uses a branch of Plots with a fix from Kristoffer that addresses an issue with the usage of cached code on Windows.

The whole benchmark code run is:

using DataFrames, ProgressMeter

path = "path/to/Julias"

julias = Dict(
    "0.6.4" => "Julia-0.6.4/bin/julia.exe",
    "0.7.0" => "Julia-0.7.0/bin/julia.exe",
    "1.0.5" => "Julia-1.0.5/bin/julia.exe",
    "1.1.1" => "Julia-1.1.1/bin/julia.exe",
    "1.2.0" => "Programs/Julia-1.2.0/bin/julia.exe",
    "1.3.1" => "Julia-1.3.1/bin/julia.exe",
    "1.4.2" => "Programs/Julia/Julia-1.4.2/bin/julia.exe",
    "1.5.4" => "Programs/Julia 1.5.4/bin/julia.exe",    
    "1.6.5" => "Programs/Julia-1.6.5/bin/julia.exe",
    "1.7.3" => "Programs/Julia-1.7.3/bin/julia.exe",
    "1.8.3" => "Programs/Julia-1.8.3/bin/julia.exe",
    "1.9.0b" => "Programs/Julia-1.9.0-beta2/bin/julia.exe",
)

jdf = sort(DataFrame(version = collect(keys(julias)), location = collect(values(julias))), :version)

# Check versions
for r ∈ eachrow(jdf)
    println("Julia version: ", r.version)
    if r.version == "0.6.4"
        run(`$(normpath(path, r.location)) -e "println(Pkg.installed()[\"Plots\"])"`);    
    else
        run(`$(normpath(path, r.location)) -e "import Pkg; println(Pkg.installed()[\"Plots\"])"`);
    end
    println()
end

n = 20

jdf.times = [zeros(n) for _ ∈ 1:nrow(jdf)]

# Run code
for r ∈ eachrow(jdf)
    println("Julia version: ", r.version)
    pn = normpath(path, r.location)
    @showprogress for i ∈ 1:n
        if r.version == "0.6.4"
            r.times[i] = @elapsed run(`$pn -e "using Plots; scatter(rand(1_000))"`);    
        else
            r.times[i] = @elapsed run(`$pn -e "using Plots; display(scatter(rand(1_000)))"`);    
        end
    end
    println()
end

using Plots, Statistics

jdf.average = mean.(jdf.times)
jdf.standard_dev = std.(jdf.times)

bar(jdf.version[2:end], jdf.average[2:end], label = "",
    linewidth = 0.1, ylabel = "Time (seconds)", xlabel = "Julia version",
    yticks = 0:5:50, 
    title = "Mean/minimum time for 20 samples\nusing Plots; display(scatter(rand(1_000)))")
scatter!((1:nrow(jdf)-1) .- 0.5, minimum.(jdf.times)[2:end], label = "Minimum", markerstrokewidth = 0.1)

jdf.Plots_version = ["0.17.4", "0.19.3", "1.4.3", "1.4.3", "1.4.4", "1.6.12", 
    "1.6.12", "1.24.3", "1.27.6", "1.37.2", "1.38.0", "1.38.0-dev"]

select(jdf, :version, :average => (x -> round.(x, digits = 1)) => :average,
    :times => (x -> round.(minimum.(x), digits = 1)) => :minimum, 
    :times => (x -> round.(std.(x), digits = 1)) => :std_dev)
84 Likes

That’s really cool! If you have the time and willingness, would you be able to check DifferentialEquations?

Here’s a one-liner that I believe should work even on ancient versions:

using DifferentialEquations;  f(u,p,t) = 1.01*u; solve(ODEProblem(f,1/2,(0.0,1.0)), Tsit5(), abstol=1e-8)
4 Likes

Just wanted to say thanks to everybody who worked on this. On DFTK, with a simple SnoopPrecompile.@precompile_all_calls before a standard workflow in the package, TTFX went from 40s to 8s on my machine, and from 80s to 2s on somebody else’s (not sure why the discrepancy…), which is pretty amazing. Precompilation costs did go through the roof, which is unfortunate, but we added an environment variable so precompilation can be disabled for developers.

18 Likes

Well that was quite the Odyssey… Best to do these kinds of experiments with packages that take less than 10 minutes to install and precompile (I think on 0.6.4 it took 20 minutes just to download all the dependencies). I had also forgotten how annoying precompilation was before parallel precompilation and progress bars - just staring at a terminal with nothing happening for minutes on end isn’t fun.

And after all that the results are… not excatly overwhelming :person_shrugging:

image

Note that I couldn’t get the package to precompule on 1.4 and 1.5, so I skipped those Julia versions.

Here we are covering 4 major DiffEq releases (from 4.5 to 7.6), so I assume we are comparing across some pretty major changes to the package itself, and during installation I noticed that later versions pull in a lot of heavy dependencies like LoopVectorization. On 1.9-beta2 it looks like virtually all the time is spent in using DifferentialEquations, so maybe there’s not a lot to be gained on the actual first solve - but I don’t really know anything about DifferentialEquations so I’m not the right person to make sense of these results!

Here are the package versions used and summary statistics of the 20 samples for each Julia version:

 Row β”‚ version  DiffEq_version  average  minimum  std_dev 
     β”‚ String   String          Float64  Float64  Float64 
─────┼────────────────────────────────────────────────────
   1 β”‚ 0.6.4    4.5.0               7.1      6.8      0.4
   2 β”‚ 0.7.0    5.2.1              14.3     11.1      1.8
   3 β”‚ 1.0.5    6.10.1             25.9     20.5      2.4
   4 β”‚ 1.1.1    6.10.1             31.2     29.1      2.0
   5 β”‚ 1.2.0    6.11.0             32.1     24.5      4.7
   6 β”‚ 1.3.1    6.18.0             32.3     26.7      4.7
   7 β”‚ 1.6.5    7.6.0              12.8     11.4      1.4
   8 β”‚ 1.7.3    7.6.0              10.4      9.5      0.7
   9 β”‚ 1.8.3    7.6.0              17.7     17.3      0.4
  10 β”‚ 1.9.0b   7.6.0              10.7      9.9      0.8
Full code
using DataFrames, ProgressMeter

path = "path/to/julias"

julias = Dict(
    "0.6.4" => "Julia-0.6.4/bin/julia.exe",
    "0.7.0" => "Julia-0.7.0/bin/julia.exe",
    "1.0.5" => "Julia-1.0.5/bin/julia.exe",
    "1.1.1" => "Julia-1.1.1/bin/julia.exe",
    "1.2.0" => "Programs/Julia-1.2.0/bin/julia.exe",
    "1.3.1" => "Julia-1.3.1/bin/julia.exe",
    #"1.4.2" => "Programs/Julia/Julia-1.4.2/bin/julia.exe",
    #"1.5.4" => "Programs/Julia 1.5.4/bin/julia.exe",    
    "1.6.5" => "Programs/Julia-1.6.5/bin/julia.exe",
    "1.7.3" => "Programs/Julia-1.7.3/bin/julia.exe",
    "1.8.3" => "Programs/Julia-1.8.3/bin/julia.exe",
    "1.9.0b" => "Programs/Julia-1.9.0-beta2/bin/julia.exe",
)

jdf = sort(DataFrame(version = collect(keys(julias)), location = collect(values(julias))), :version)

# Check versions
for r ∈ eachrow(jdf)
    println("Julia version: ", r.version)
    if r.version == "0.6.4"
        run(`$(normpath(path, r.location)) -e "println(Pkg.installed()[\"DifferentialEquations\"])"`);    
    else
        run(`$(normpath(path, r.location)) -e "import Pkg; println(Pkg.installed()[\"DifferentialEquations\"])"`);
    end
    println()
end

n = 20

jdf.times = [zeros(n) for _ ∈ 1:nrow(jdf)]

# Run code
for r ∈ eachrow(jdf)
    println("Julia version: ", r.version)
    pn = normpath(path, r.location)
    @showprogress for i ∈ 1:n
        r.times[i] = @elapsed run(`$pn -e "using DifferentialEquations;  f(u,p,t) = 1.01*u; solve(ODEProblem(f,1/2,(0.0,1.0)), Tsit5(), abstol=1e-8)"`);    
    end
    println()
end

using Plots, Statistics

jdf.average = mean.(jdf.times)
jdf.standard_dev = std.(jdf.times)

bar(jdf.version, jdf.average, label = "",
    linewidth = 0.1, ylabel = "Time (seconds)", xlabel = "Julia version",
    yticks = 0:5:50, 
    title = "Mean/minimum time for 20 samples\nTime-to-first-DiffEq solve")
scatter!((1:nrow(jdf)) .- 0.5, minimum.(jdf.times), label = "Minimum", markerstrokewidth = 0.1)

jdf.DiffEq_version = ["4.5.0", "5.2.1", "6.10.1", "6.10.1", "6.11.0", "6.18.0", 
    "7.6.0", "7.6.0", "7.6.0", "7.6.0"]

print(select(jdf, :version, :DiffEq_version, :average => (x -> round.(x, digits = 1)) => :average,
    :times => (x -> round.(minimum.(x), digits = 1)) => :minimum, 
    :times => (x -> round.(std.(x), digits = 1)) => :std_dev))
21 Likes

We should also keep in mind that the amazing new pkgimg caching of native code solves only β€œtime to first X after things are already imported when invalidations are fixed by package developers”. Developers separately are solving the invalidation issues (which had started to pay off even before pkgimages). However, there is still a lot of work that can be done on the β€œimport time” side of things. The plots above show β€œimport time + time to first X after import” which can be confusing because:

  • pkgimages (and fixing invalidations) turn β€œtime to first X after import” from many seconds to milliseconds
  • pkgimages do not change β€œimport-only time” much

Nonetheless, this is an amazing step forward, with many other exciting ideas being rumored on offhand comments on github.

2 Likes

This is a very cool graph. I think this subject should have its own blog post in addition to being mentioned in the highlights post and plots like this would be excellent to show the trajectory over time. I can add some interpretation here:

  • 0.7 versus 1.0: The 1.0 release deleted a lot of deprecation code, making it significantly leaner and faster than 0.7.
  • 1.0 through 1.5: The compiler got smarter and generated faster code, but at the expense of worsening compile times. Some compensatory latency reductions were done in this era, but they were outpaced by improved compiler output taking more time (and LLVM getting slower).
  • 1.6 through 1.8: Incremental improvements to compile time due to focus on reducing latency in both Julia and LLVM. We were always acutely aware of this, and people started shaming LLVM for its compile times; LLVM added compile time benchmarks and now tracks them and has been improving somewhat.
  • 1.9: Machine code caching is transformative at significantly reducing TTFP. Now that it works, more can be done here: packages can precompile more now that it’s more worthwhile to do so; fixing invalidations now has larger proportional effect; more effort can be made on reducing the remaining bottlenecks in load time.

Notable is that second two phases retain and even build on the runtime performance improvements gained in the previous era so it’s a win-win: much improved load latency and much improved runtime performance.

37 Likes

Oh, and we might see big improvements in load / compile time from the newly landed β€œweak dependencies” functionality, which can reduce the number of dependencies that packages have considerably by allowing them to convert some regular deps to weak deps so that you don’t have to pay all the time on the off chance that you might use them. That requires more of an ecosystem wide movement though.

20 Likes

I’d also add all the method invalidation hunting that happened during this period, happening in a concerted manner across the entire ecosystem, with new package versions that are dependent upon work in Julia itself. The best compile time improvements are compilations that don’t need to happen!

4 Likes

I think that’s quite interesting, thanks! (I should have told you to use only OrdinaryDiffEq.jl, that would have had way less dependencies, sorry about that!)

1 Like

I promise I’ll stop now, but one thing I was interested in was a simple β€œdata sciency” task - read a CSV file into a DataFrame. This is another classic where traditionally Julia’s performance was around best-in-class (with multithreaded CSV parsing being competitive with the best parsers in other languages, and DataFrames doing very well on the h2o benchmarks when they were still around), but the benchmarks always felt a bit disingenuous: of course it’s great to be able to read a csv in 3 seconds which might take other languages 10 seconds, but that’s not much help if you add on 10+ seconds of TTFX given that one frequently only wants to read a single csv file, once.

Happily, this problem is also mostly solved in 1.9:

image

Here I’m reading in a 1,000,000 row, 6 column (one string, five float) csv file and turning it into a DataFrame (which is DataFrame(CSV.File("testcsv.csv")) for the most part, except for Julia 0.6 when CSV still had DataFrames as a dependency and the API was CSV.read("testcsv.csv")).

The file I’ve picked here is small enough that any improvements in parsing speed in CSV itself over time are unlikely to matter - on my machine the file is read in less than 0.5s using the latest CSV version (and 0.1s when starting Julia with more threads, although that seems to increase the TTFX by a second or so so on a one-shot benchmark the gains are negated - the benchmarks are run single-threaded).

I haven’t tested using vs. executing the actual DataFrame(CSV.File()) call separately across versions, but in 1.9b it seems the <4s break down into a bit over 2s for using CSV, DataFrames and around 1.5s for reading the csv and creating the DataFrame, which again for all intents and purposes feels quite instantaneous for loading a csv file.

More details on package versions used and timings, as well as benchmark code run, in the section below.

Details

Detailed results by Julia version:

 Row β”‚ version  CSV_version  DataFrames_version  average  minimum  std_dev 
     β”‚ String   String       String              Float64  Float64  Float64 
─────┼─────────────────────────────────────────────────────────────────────
   1 β”‚ 0.6.4    0.2.5        0.11.7                 10.4     10.1      0.4
   2 β”‚ 0.7.0    0.4.3        0.17.1                 10.8     10.5      0.4
   3 β”‚ 1.0.5    0.8.5        1.3.6                  14.0     12.4      0.8
   4 β”‚ 1.1.1    0.8.5        1.3.6                  14.8     13.7      0.9
   5 β”‚ 1.2.0    0.8.5        1.3.6                  15.2     13.3      1.4
   6 β”‚ 1.3.1    0.10.4       1.3.6                  20.5     18.6      2.1
   7 β”‚ 1.4.2    0.10.4       1.3.6                  27.2     22.2      2.6
   8 β”‚ 1.5.4    0.10.4       1.3.6                  22.0     19.8      0.7
   9 β”‚ 1.6.5    0.10.9       1.3.3                  12.3     11.6      0.4
  10 β”‚ 1.7.3    0.10.9       1.3.4                  13.7     12.3      0.6
  11 β”‚ 1.8.3    0.10.9       1.4.4                  17.0     16.4      0.3
  12 β”‚ 1.9.0b   0.10.9       1.4.4                   4.1      3.7      0.3

Benchmark code run:

using DataFrames, ProgressMeter

path = "path/to/julias"

julias = Dict(
    "0.6.4" => "Julia-0.6.4/bin/julia.exe",
    "0.7.0" => "Julia-0.7.0/bin/julia.exe",
    "1.0.5" => "Julia-1.0.5/bin/julia.exe",
    "1.1.1" => "Julia-1.1.1/bin/julia.exe",
    "1.2.0" => "Programs/Julia-1.2.0/bin/julia.exe",
    "1.3.1" => "Julia-1.3.1/bin/julia.exe",
    "1.4.2" => "Programs/Julia/Julia-1.4.2/bin/julia.exe",
    "1.5.4" => "Programs/Julia 1.5.4/bin/julia.exe",    
    "1.6.5" => "Programs/Julia-1.6.5/bin/julia.exe",
    "1.7.3" => "Programs/Julia-1.7.3/bin/julia.exe",
    "1.8.3" => "Programs/Julia-1.8.3/bin/julia.exe",
    "1.9.0b" => "Programs/Julia-1.9.0-beta2/bin/julia.exe",
)

jdf = sort(DataFrame(version = collect(keys(julias)), location = collect(values(julias))), :version)

# Check versions
jdf.CSV_version .= ""
jdf.DataFrames_version .= ""

for r ∈ eachrow(jdf)
    println("Julia version: ", r.version)
    pn = normpath(path, r.location)
    if r.version == "0.6.4"
        r.CSV_version = readchomp(`$pn -e "println(Pkg.installed()[\"CSV\"])"`);
        r.DataFrames_version = readchomp(`$pn -e "println(Pkg.installed()[\"DataFrames\"])"`);    
    else
        r.CSV_version = readchomp(`$pn -e "import Pkg; println(Pkg.installed()[\"CSV\"])"`);
        r.DataFrames_version = readchomp(`$pn -e "import Pkg; println(Pkg.installed()[\"DataFrames\"])"`);
    end
    println()
end

n = 20

jdf.times = [zeros(n) for _ ∈ 1:nrow(jdf)]

testcsv = "\"/Documents/testcsv.csv\""

# Run code
for r ∈ eachrow(jdf)
    println("Julia version: ", r.version)
    pn = normpath(path, r.location)
    @showprogress for i ∈ 1:n
        if r.version == "0.6.4"
            r.times[i] = @elapsed run(`$pn -e "using DataFrames, CSV; CSV.read($testcsv);"`);
        else
            r.times[i] = @elapsed run(`$pn -e "using DataFrames, CSV; DataFrame(CSV.File($testcsv));"`);
        end
    end
    println()
end

using Plots, Statistics

jdf.average = mean.(jdf.times)
jdf.standard_dev = std.(jdf.times)

bar(jdf.version, jdf.average, label = "",
    linewidth = 0.1, ylabel = "Time (seconds)", xlabel = "Julia version",
    yticks = 0:5:50, 
    title = "Mean/minimum time for 20 samples\nusing CSV, DataFrames; DataFrame(CSV.File(...))",
    color = 5)
scatter!((1:nrow(jdf)) .- 0.5, minimum.(jdf.times), label = "Minimum", markerstrokewidth = 0.01,
    markersize = 6, color = 6)

print(select(jdf, :version, :CSV_version, :DataFrames_version, 
    :average => (x -> round.(x, digits = 1)) => :average,
    :times => (x -> round.(minimum.(x), digits = 1)) => :minimum, 
    :times => (x -> round.(std.(x), digits = 1)) => :std_dev))
37 Likes

I’m hoping this is the right place to share individual package improvements but there has been a long standing issue at Bessels.jl about long compile times which is a big disadvantage in basic math functions compared to wrapped C/Fortran libraries where TTFX was sometimes >10x slower.

Luckily, all of this time was spent on LLVM code generation and there are no invalidations so this should be a good test case.

Bessels load details
# Version 1.8.2 (2022-09-29)
julia> @time @eval using Bessels
  0.021906 seconds (44.90 k allocations: 4.828 MiB)

julia> @time @eval besselj(1.2, 8.2)
  0.422299 seconds (52.75 k allocations: 2.216 MiB, 99.93% compilation time)
0.21735942469289246

# Version 1.9.0-beta2 (2022-12-29)
julia> @time @eval using Bessels
  0.018595 seconds (22.22 k allocations: 2.496 MiB)

julia> @time @eval besselj(1.2, 8.2)
  0.000137 seconds (46 allocations: 2.266 KiB)
0.21735942469289246

Package load times are slightly better (a few samples suggest between 5-10%) on 1.9.0-beta2 while TTFX is 3,000x faster (0.4s β†’ 0.0001s). Of course, comparatively to some of the use cases where TTFP can take tens of seconds the user doesn’t feel the effect as much, but this now makes the initial call of a native Julia function faster than the wrapped C/Fortran version (of course this isn’t exactly a fair comparison as this could undoubtedly be improved).

Special Functions comparison
# Version 1.9.0-beta2 (2022-12-29)
julia> @time @eval using SpecialFunctions
  0.102223 seconds (181.37 k allocations: 13.283 MiB, 1.77% compilation time)

julia> @time @eval besselj(1.2, 8.2)
  0.055451 seconds (85.39 k allocations: 5.645 MiB, 130.88% compilation time)
0.2173594246928925
15 Likes

What does @eval actually do here? Is it necessary?