Julia vs Fortran speed

A friend of mine wanted to check the speed of Julia vs a compiled language like C++ or Fortran. So he wrote a simple routine in Julia to invert a matrix. Then he compiled a Fortran code to do the same. Compiler flags were all set for maximum optimization and the function was using the LApack routines for matrix inverse. The Julia code still won by about a factor of 10. (I haven’t done this myself so I can’t verify the results. ) I’ve shared this with colleagues who are not yet convinced they should learn Julia and inevitably they counter my story with the question: if Julia is so fast precisely because it used compiled code how can it outperform a compiled language? That’s where I get stuck.

Any thoughts?

1 Like

The Julia function may be multi threaded?

The Fortran function was the same lapack function, or was it hand written?

Hard to tell with so few information. Similar codes, should not be that different (in either direction).

Ps: in this repository I have a benchmark vs. Fortran of a simple simulation (and a series of Julia features described in the notebook that I presented at the last FortranCon):

https://github.com/m3g/2021_FortranCon/tree/main/benchmark_vs_fortran

3 Likes

it (theoretically, after hiring enough Fortran devs) can’t and shouldn’t, at least in this kind of micro-benchmark anyway, where the state-of-the-art solution is hand-tuned at assembly level (a la OpenBLAS).

So either Julia is “cheating” (for example, multi-threading flag difference), or the Fortran code is sub-optimally written.

(disclaimer, there’s actually semi-counter example: GitHub - JuliaLinearAlgebra/Octavian.jl: Multi-threaded BLAS-like library that provides pure Julia matrix multiplication . Which shows you the potential Julia has: you don’t have to walk very far to see something people just can’t optimize well by hand

One possibility, did you link the Fortran code against the reference BLAS? For a fairer comparison you’d want to use the same BLAS for both. Julia uses OpenBLAS by default but can change that (at runtime even). Of course then this comparison becomes uninteresting since they should have the same speed. What you want to do is write an algorithm where the time is spent in the code you write. We have a bunch of such examples in our suite of microbenchmarks: code here, results here.

2 Likes

The property of being a compiled language is not a “magic go fast” switch - compilers behave differently, have different optimization passes and, most importantly, are not all equal for the exact same code. On top of that, if the code was not semantically equivalent (e.g. the difference hinted at with multithreading above, or using BLAS by accident in one language but not the other), expecting the compiler to make up the difference seems… challenging.

Still, without seeing both codes, we’re all just speculating here.

1 Like

All of these focus on cases where it is “different code” and that does seem likely to be the explanation in the matrix-inversion case you observed. However, to your more general question, there are cases where Julia can be faster than some compiled languages due to Julia’s specialization. Some examples:

  • C’s qsort is somewhat infamously slow; both Julia’s sort and C++'s std::sort beat it partly because they can inline the comparison function (that’s a language/compiler feature, not an implementation detail)
  • in compiled languages, array-focused algorithms are sometimes implemented in “fast” versions for 1 or 2 dimensions and then may have a generic but slow fallback that gets used for dimensions 3 and higher. In Julia, we can write a single implementation that works for an arbitrary number of dimensions, and Julia’s compiler will automatically specialize the code for each dimension as it gets used. I’ve occasionally seen 1000x speedups in Julia (comparing against “high-performance” compiled languages) because of this issue.
10 Likes