Any benchmark of Julia v1.0 vs older versions

It’s as fast as C if you don’t trigger the garbage collector. It’s common to pre-allocate memory, or just use stack memory, so that it doesn’t get triggered in the most sensitive parts of your code. A really cool example I saw recently, for small dimensional optimization problems:

Doesn’t allocate any memory, and runs incredibly fast. For small dimensional problems, you’d be hard pressed to find anything faster.

I think Julia in practice will often be faster than C, because it is easier to specialize code for a given problem.
Given two libraries, one written in C, and one in Julia, I wouldn’t bet on the C library being faster.
As an example, here are two libraries by the same author (someone known for writing high performance software Steven G. Johnson - Wikipedia):
GitHub - JuliaMath/Cubature.jl: One- and multi-dimensional adaptive integration routines for the Julia language # Written in C, has the advantage that it also offers p-cubature
GitHub - JuliaMath/HCubature.jl: pure-Julia multidimensional h-adaptive integration # Written in pure Julia

# session started with -O3 --depwarn=no
julia> using HCubature, Cubature, StaticArrays, BenchmarkTools

julia> f(x) = exp(-x' * x/2)/2
f (generic function with 1 method)

julia> @btime HCubature.hcubature(f, SVector(-20.,-20.), SVector(20.,20.), rtol=1e-8)
  2.517 ms (63938 allocations: 1.70 MiB)
(3.1415926534311005, 3.141588672705139e-8)

julia> @btime Cubature.hcubature(f, SVector(-20.,-20.), SVector(20.,20.), reltol=1e-8)
  5.425 ms (193752 allocations: 8.28 MiB)
(3.1415926534311027, 3.141588673692509e-8)

julia> @btime HCubature.hcubature(f, SVector(-20.,-20.), SVector(20.,20.), rtol=1e-12) # Julia
  62.269 ms (1448586 allocations: 36.86 MiB)
(3.1415926535897993, 3.1233157511412803e-12)

julia> @btime Cubature.hcubature(f, SVector(-20.,-20.), SVector(20.,20.), reltol=1e-12) # C
  146.159 ms (4315742 allocations: 184.39 MiB)
(3.1415926535897976, 3.1411238232300217e-12)

Chris Rackauckas also explains the advantages Julia’s late compilation provides here::

I also gave an example of writing optimized kernels in Julia using SIMD intrinsics in pure Julia here: matmul post . At the end, I compared multiplying two 200x200 matrices with that Julia code with OpenBLAS, which has kernels written in assembly: Skylake-X OpenBLAS kernel. Julia took 147.202 μs, OpenBLAS took 335.363. (To be fair, some of that difference was overhead, that I skipped in Julia by taking care of all that at compile time – but that again presents an advantage of Julia’s late compilation in practice).

It’s possible to write slow code in any language, but it’s also definitely possible to write among the fastest code in Julia. More than that, the fastest generic code for libraries aimed at end users who’re going to try and do who-knows-what?

8 Likes