Any benchmark of Julia v1.0 vs older versions

Hello.

Where can I find a comparison of the speed of Julia v1.0 vs other versions such as v0.7, v0.6, v0.5 or against other languages?

2 Likes

Please donā€™t ask for it, weā€™re happy with Julia as is at the moment, reaching 1.0 itself was a big achievement. Besides, benchmarks are never done right, the old one is already flawed in some ways. Julia 1.0 was meant to achieve language API stability and the focus now should be on getting the echo system catching up. 1.x releases were planned to focus more on compiler optimizations. I donā€™t see a benefit for such a micro benchmark now, especially that Julia turned out to be more efficient for large multi-file projects.

1 Like

Youā€™re right of course ā€“ the focus was on API stability, not optimizations, butā€¦
Julia 1.0 and 0.7 (which, Juan, are essentially the same, except dep warnings in 0.7 are errors in 1.0) are definitely faster than 0.6. Itā€™s pretty common to see a free 10% improvement or more, but it varies.
There was also a thread a few months ago referencing an econ article that benchmarked a bunch of languages, including Julia 0.2.
Julia 0.6 did much better than Julia 0.2 relative to a few other languages. Hereā€™s the thread: A Comparison of Programming Languages in Economics - #4 by tkoolen

Julia 1.0 also starts much faster than Julia 0.6. Fantastic work all around.

2 Likes

I was trying to decide what version to use now.
I know itā€™s a great achievement, and it will push people to use Julia.

Anyway Iā€™m not only interested on the speed of v1.0 but on seeing the evolution of all versions till now.

The microbenchmark results for 1.0 should be up on julialang.org within a few days. 0.7 was a little slower than 0.6, as measured by geometric mean of the microbenchmarks. But keep in mind thatā€™s a not-necessarily-representative set of benchmarks, and that lots of optimization will surely occur for 1.x now that the syntax and Base functionality is solid.

2 Likes

On julialang.org I can see a plot with the results for several languages.
But there is just one result for Julia.
How can we find the results broken down by Juliaā€™s version?

1 Like

Iā€™ve yet to find out what optimization or improvement weā€™re affected by, but Optim seems to be quite solidly 1.5 to 2 x faster between 0.6.4 and 1.0.

3 Likes

Unless you have mission-critical software that already runs smoothly on v0.6, I would recommend transitioning v1.0 and trusting that the occasional performance regression will be fixed, especially if you are willing to help with an MWE.

That said, I find I get a 10-50% improvement ā€œfor freeā€. Occasionally even more, but that comes from consciously using idioms in v0.6 which were suboptimal there but expected to work better in v0.7 onwards (small unions).

8 Likes

I can probably retrieve and post comparative data for 0.4 though 1.0 here, probably by the end of the week.

4 Likes

https://github.com/JuliaCI/BaseBenchmarkReports/blob/133cf1583ed678ed16e49312d525af1f095f7e8d/00e8af1_vs_df1c1c9/report.md

is between 0.6 and a fairly late 0.7 version.

Also, see https://github.com/JuliaLang/julia/pull/27030.

2 Likes

Hearing that they showed a slight regression in 0.7/1.0 makes me more inclined to think those benchmarks arenā€™t representative of code ā€œin the wildā€ than think there actually was a regression on average.
As a simple example, the benchmarks donā€™t use ā€˜@inboundsā€™, which prevents auto-vectorization while indexing into arrays. And Juliaā€™s (LLVMā€™s?) block vectorizer got better. This isnā€™t seen by the benchmarks. Constant propagation through function boundaries, improved handling of small unions, better inlining heuristics, etcā€¦ I doubt I know half the improvements.

1 Like

As another data point: benchmarks for RigidBodyDynamics improved 15-30 percent by switching from 0.6 to 0.7/1.0. That code was already optimized pretty well.

2 Likes

FWIW, here are microbenchmark results for julia-0.6.4, 0.7.0, and 1.0.0. Thereā€™s some improvement in matrix_statistics and recursion_fibonacci and some degradation in parse_integers and print_to_file.

@kristoffer.carlsson has already found a factor of two improvement for the integer parsing library code (https://github.com/JuliaLang/julia/pull/28661) which should get parse_integer back down to where it was or better. If thereā€™s a similar fix for printing ints then the 1.0.x microbenchmarks will show slight improvement over 0.6 overall. Of course the microbenchmarks are in no way a representative sample of real-world code.

0.6.4 0.7.0 1.0.0
iteration_pi_sum 27.37 27.67 27.66
matrix_multiply 70.24 70.22 70.32
matrix_statistics 8.513 7.286 7.323
parse_integers 0.132 0.221 0.218
print_to_file 6.860 10.833 10.870
recursion_fibonacci 0.0406 0.0302 0.0302
recursion_quicksort 0.248 0.261 0.259
userfunc_mandelbrot 0.0565 0.0527 0.0527
13 Likes

Nice.
I can see itā€™s quite stable except strangely for parse_integers and print_to_file that now need double time.

Wow. Thatā€™s incredibly consistent especially considering that the optimizer was completely rewritten.

Of course it would be very interesting to know what happened to parsing and printing.

https://github.com/JuliaLang/julia/pull/28670 should improve print_to_file as well.

2 Likes

Thatā€™s exactly what I meant in the first comment in this thread, these micro benchmarks donā€™t reflect the actual improvements that have been made. In my real-world large codes I see about 20 - 50% improvement moving from 0.6.4 to 1.0. Iā€™m still happy though because the most important test in my opinion, matrix_statistics, got improved. That said, this specific benchmark tests looping performance rather than matrix statistics, I didnā€™t see one doing statistics on a tiny 5-by-5 matrix before, choosing a medium-sized, more practical matrix would be fair.

1 Like

Is there any prediction on how fast can Julia be compared to C in the future?
I mean theoretical limits due to the way in manages data and garbage and access memory.

What areas can be improved?
What areas are already state-of-the-art?

Itā€™s as fast as C if you donā€™t trigger the garbage collector. Itā€™s common to pre-allocate memory, or just use stack memory, so that it doesnā€™t get triggered in the most sensitive parts of your code. A really cool example I saw recently, for small dimensional optimization problems:

Doesnā€™t allocate any memory, and runs incredibly fast. For small dimensional problems, youā€™d be hard pressed to find anything faster.

I think Julia in practice will often be faster than C, because it is easier to specialize code for a given problem.
Given two libraries, one written in C, and one in Julia, I wouldnā€™t bet on the C library being faster.
As an example, here are two libraries by the same author (someone known for writing high performance software Steven G. Johnson - Wikipedia):
GitHub - JuliaMath/Cubature.jl: One- and multi-dimensional adaptive integration routines for the Julia language # Written in C, has the advantage that it also offers p-cubature
GitHub - JuliaMath/HCubature.jl: pure-Julia multidimensional h-adaptive integration # Written in pure Julia

# session started with -O3 --depwarn=no
julia> using HCubature, Cubature, StaticArrays, BenchmarkTools

julia> f(x) = exp(-x' * x/2)/2
f (generic function with 1 method)

julia> @btime HCubature.hcubature(f, SVector(-20.,-20.), SVector(20.,20.), rtol=1e-8)
  2.517 ms (63938 allocations: 1.70 MiB)
(3.1415926534311005, 3.141588672705139e-8)

julia> @btime Cubature.hcubature(f, SVector(-20.,-20.), SVector(20.,20.), reltol=1e-8)
  5.425 ms (193752 allocations: 8.28 MiB)
(3.1415926534311027, 3.141588673692509e-8)

julia> @btime HCubature.hcubature(f, SVector(-20.,-20.), SVector(20.,20.), rtol=1e-12) # Julia
  62.269 ms (1448586 allocations: 36.86 MiB)
(3.1415926535897993, 3.1233157511412803e-12)

julia> @btime Cubature.hcubature(f, SVector(-20.,-20.), SVector(20.,20.), reltol=1e-12) # C
  146.159 ms (4315742 allocations: 184.39 MiB)
(3.1415926535897976, 3.1411238232300217e-12)

Chris Rackauckas also explains the advantages Juliaā€™s late compilation provides here::

I also gave an example of writing optimized kernels in Julia using SIMD intrinsics in pure Julia here: matmul post . At the end, I compared multiplying two 200x200 matrices with that Julia code with OpenBLAS, which has kernels written in assembly: Skylake-X OpenBLAS kernel. Julia took 147.202 Ī¼s, OpenBLAS took 335.363. (To be fair, some of that difference was overhead, that I skipped in Julia by taking care of all that at compile time ā€“ but that again presents an advantage of Juliaā€™s late compilation in practice).

Itā€™s possible to write slow code in any language, but itā€™s also definitely possible to write among the fastest code in Julia. More than that, the fastest generic code for libraries aimed at end users whoā€™re going to try and do who-knows-what?

8 Likes

Do you think we wil see OpenBlas, MKL and similar libraries written completly in Julia?