Benchmark for latest julia?

The Julia benchmark does call the “generic” one: * … it’s just that the generic routine dispatches to the fast one. Whether matmul or * call down to a fast BLAS is a library issue. With a decent Fortran system, matmul calls a fast BLAS, and any “official” posted benchmark numbers should reflect such a configuration.

Frankly, I find this particular benchmark pretty uninteresting — as far as I can tell, its only purpose is to highlight the fact that matrix multiplication via the standard library (if one exists) is basically the same speed in all languages, properly configured, because every standard library can be configured to call the same fast BLAS.


By the way, for previous discussions on this topic, see e.g.:

(The usual thing that happens if you ask experts from other languages to work on the benchmarks is that they say “no one would write code this way … you need to vectorize / call optimized library X”, which misses the point of the benchmark.)


Sorry, by generic I meant the appropriate _generic_matmatmul, not the high-level *.

I agree with your sentiment, which is precisely why the Fortran benchmark should call BLAS.

Maybe some of the benchmarks should be renamed to reflect what they test as opposed to what their test case is. E.g. fib -> recursion.


Fortran calling BLAS directly would be fine. We didn’t write the Fortran benchmark code; someone contributed it in 2012: We took the implementation at face value as a reasonable one – I, for one, am not a Fortran programmer. If you think there are improvements that should be made to the Fortran code for a fairer comparison, please do contribute them.

I posted this PR a few days back. It gets test/perf/micro running again on julia-0.7.0. It’s a few Makefile changes for libraries that have moved in the julia source tree since 0.4.0.

A few things/decisions remain before producing new benchmark data for publication:

  1. Getting Fortran to call BLAS. I tried @Ralph_Smith’s suggestion above but got utterly horrific rand_mat_mul results (~100 times slower). I need to double-check that I linked to the correct -lblas, or that there wasn’t some problem in the interface with wrong integer size.

  2. Getting R, Python, and Octave to call BLAS for rand_mat_mul. It looks like that’ll require recompiling and relinking these packages (instead of using my non-BLAS default packages on openSUSE). I’m not keen on doing this, and I’m not sure it’s fair.

  3. Investigate slowness of mandel on C compared to Java, Javascript, and Lua. I’d welcome help here.

  4. Rename the benchmarks. This idea has come up repeatedly as a way to clarify we are testing recursion, not the optimal fib algorithm, for example. My proposed renaming

old new
fib recursion_fibonacci
quicksort recursion_quicksort
pi_sum iteration_pi_sum
mandel iteration_mandelbrot
parse_int parse_integers
rand_mat_mul matrix_multiply
rand_mat_stat matrix_statistics
printfd print_decimals

Do those look good?


I like the renaming.


I believe LuaJIT was always used (or else you wouldn’t have seen that speed with the non-JIT “Lua” implementation).

Replacing “LuaJIT” text with “SciLua” in the PR warranted? Maybe rather say LuaJIT and SciLua on the following line.

I didn’t look to much into SciLua, but it seems to “LuaJIT” language/implementation; with extras similar to Numpy? Does it make any difference, maybe only for BLAS using code, if that?

At some point this plot result was changed to be sorted alphabetically, which I find to be unhelpful for understanding what it’s telling me about overall performance. I believe it was originally sorted by the geometric mean of benchmark times, which would be a good ordering, imo, although I think having Julia in the first column on the benchmarks is also justifiable.

1 Like

SciLua author here.

First of all, I fully agree on having LuaJIT + SciLua as naming, or LuaJIT using SciLua libraries, instead of just SciLua.
Without LuaJIT none of my work would have been possible :slight_smile:

Some clarifications:

  1. LuaJIT is a JIT implementation of the Lua (5.1 + cherry picked additions from 5.2 and 5.3) language from Mike Pall, version 2.1 brings optimizations and speeds improvement over 2.0
  2. The “sci” module is a collection of algorithms for general purpose scientific computing (think GSL in C), relies on OpenBLAS for some matrix operations, but it’s otherwise 100% in LuaJIT
  3. the “scilua” executable adds language syntax extensions to LuaJIT in the form of syntactic sugar to facilitate the writing of vector/matrix expressions, requires the “sci” module of point 2.

With SciLua I refer to the framework composed by the module and the executable above, both of which use LuaJIT 2.1.

Please feel free to contact me if you have questions.


What makes it so fast?

Maybe there tricks to learn.

The latest benchmarks, approaching completion.

Revisions since last time

  • got BLAS working for Fortran and Octave (thanks, @Ralph_Smith) but not for Python. Not sure if I’ll get Python working with BLAS; this seems fairly involved.

  • renamed the benchmarks. Which is better, print_decimals or print_to_file? The relevant C code is fprintf(f, "%ld %ld\n", i, i); where f is /dev/null.

  • reordered putting Julia next to C and other languages reordered by eye. It’s not straightforward to compute the geometric mean because not all languages implement all benchmarks (e.g. print_to_file is missing for Fortran).

  • investigated the unexpectedly good performance of some iteration_mandelbrot codes and found no obvious flaws. They all seem to be doing the same combinations of loops and computing the same numbers.

  • changed the colormap to a handtooled rainbow since I found some of the default plotting colors hard to distinguish. I has a slightly more vibrant feel than the old colors; let me know if you dislike that.

  • reverted “scilua” back to “luaJIT + scilua” in some places

I’m about ready to call it quits and post a PR for the new benchmark table and plot to


Quick suggestion: consider adding faint minor gridlines between the powers of 10. It is not currently visually obvious that the y-axis has a logarithmic scale, but minor gridlines would make this very clear.

EDIT: The colors are great now, but could you fix the spacing issues in the legend before you publish? Also, the x-axis font is kind of ugly (IMHO).


The fonts seems to lack anti-aliasing or something. A quick fix would be to export it 4x the size and then resize it down.

That’s an artifact of conversion to PNG, which is necessary for posting to Discourse. The orginal SVG is nice looking, that’s what’ll appear on the julialang website. Edit: @NickNack: same applies to the axis labels.

There’s a julia convention of using distinguishable_colors() from the Colors package for separate series, which is repeated in many places. It may be what was originally used. I’d suggest going with that for consistency.
If you dislike it, there are also some good distinctive schemes in Colorbrewer.jl, e.g. Dark2 might suit your taste. I am personally no big fan of the rainbow :slight_smile:


Great Work!

For Python, why don’t you use Anaconda to have Intel MKL for BLAS out of the box?

Moreover, I think all (Most?) languages are moving target.
So one should write the specific versions used (BLAS version as well) and compilers.

Lua is amazingly fast.
Can we make Mike Pall interested in Julia :-)?

1 Like

I saw the changes got merged. Thanks for your work!


Mike Pall has written fairly extensively about how he makes LuaJIT so fast. It’s brilliant work but not really applicable to Julia at all.

Really interesting,
Nothing is shared between those 2 approaches?

I have little knowledge about any of them.
But very excited to see what talented people do, on both projects.

Well, maybe one day he will contribute to Julia without any relation.