julia> @bprofile f()
BenchmarkTools.Trial: 8658 samples with 1 evaluation.
Range (min … max): 461.459 μs … 1.201 ms ┊ GC (min … max): 0.00% … 25.41%
Time (median): 560.666 μs ┊ GC (median): 0.00%
Time (mean ± σ): 574.060 μs ± 89.722 μs ┊ GC (mean ± σ): 4.29% ± 8.08%
▁██▄▃ ▆▆▃▃
▁▂█████▇▃▂▂▃██████▄▃▄▄▄▄▄▃▃▂▂▂▂▃▄▄▄▃▃▃▂▂▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁ ▃
461 μs Histogram: frequency by time 886 μs <
Memory estimate: 7.63 MiB, allocs estimate: 2.
1.8rc3:
julia> @bprofile f()
BenchmarkTools.Trial: 7688 samples with 1 evaluation.
Range (min … max): 452.503 μs … 1.562 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 673.130 μs ┊ GC (median): 0.00%
Time (mean ± σ): 645.455 μs ± 143.323 μs ┊ GC (mean ± σ): 4.75% ± 9.26%
▅█▄ ▁▂▂
▁▂▆███▆▃▂▂▂▂▂▁▁▁▁▂▂▃▃▃▄▆████▄▃▂▂▂▂▂▂▂▂▂▃▃▃▃▃▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁ ▂
453 μs Histogram: frequency by time 1.04 ms <
Memory estimate: 7.63 MiB, allocs estimate: 2.
It does, however, suggest the GC sweeps are less frequent but take longer. That’s a reasonable tradeoff.
On the “benchmark” with 10^9 samples, it depends on how many browser tabs you have open, whether you switched windows between profiling Julia and checking your slack, whether you remembered to close the 1.7 window when you ran 1.8, etc. Unless you really know what you’re doing, and your intent is to profile, e.g., your OS swap behavior, the 10^9 case is 100%, entirely, completely, utterly useless. Everyone should please stop worrying about it.
For some less synthetic benchmarks, I have been tracking the benchmarks results of some of my packages over julia versions at QuantumClifford Benchmarks.
For that package 1.9-nightly is better than 1.8-beta which is better than 1.6.0 and 1.7.0. Thus, at least one person is really happy with the improvements in the Julia runtime and compiler.
There have also been nice improvements in TTFX, but they have been negated by some packages going out of Base and having to be loaded as well.
I’ve been working on that case for 2 straight weeks :-). https://github.com/JuliaLang/julia/pull/46010 almost fixes it. There’s still a 1s regression (EDIT: you can shave this to 0.5s if you use https://github.com/timholy/SnoopCompile.jl/tree/master/SnoopPrecompile) but it’s no longer due to any of the precompilation improvements, and in fact they make your life better. But lots of things change from version to version; LLVM has changed its performance profile too, and you’re one of the unlucky people for whom the net effect is worse. Sorry.
That’s because it only got submitted as a new package this morning, and it will be Monday before it’s actually released.
The package itself is really simple, but it’s designed to make precompilation less finicky. It only does 3 things: (1) run the block only when precompilling, (2) disable the interpreter when running the block (to ensure everything gets compiled), and (3) intercept runtime dispatch to force precompilation of calls to methods defined in other packages. Each of these is in itself just a few lines, and all the supporting infrastructure for this has existed for a while now (although I guess that depends on perspective, most of it’s not in 1.7, but all will be in 1.8). The package just combines them in an easy-to-use wrapper that hopefully means people will be able to get high-quality precompilation without being an expert.
There’s one exception: invalidation will remain a threat for the foreseeable future, and that still takes some expertise to diagnose and fix. I don’t yet have any great ideas about making this easier, but maybe someone else will come up with something.