Benchmark for latest julia?

While what you suggest should be done, this highlights a general issue with benchmarks like this: in order to get a fair assessment, we should ask experienced users of all of these languages to optimize their code.

The Computer Language Benchmarks Game has seen a lot of cycles that go like this:

  1. users of language A optimize their toy benchmark,
  2. language B is now behind, and someone with time on their hands optimizes that benchmark,
  3. the language C community is now miffed, to the point that someone actually tweaks their interpreter to deal better with the benchmark,
  4. ā€¦
2 Likes

When you benchmark, you have to decide what you are trying to measure.

This particular benchmark is not supposed to test the most optimized possible code for a given problem. e.g. an optimized fib would just use a look-up table and would be totally uninteresting as a benchmark. The point of this benchmark is to test the performance of common language constructs ā€” e.g. looping, recursion, and even matrix multiplication, all written in the most ā€œordinaryā€ way (which is why the Fortran one calls the built-in matmul and not dgemm, and similarly Julia calls * and not A_mul_B! or BLAS.dgemm to avoid allocations).

It is also useful to benchmark highly optimized code, of course, but that is a very different exercise and calls for very different sorts of problems.

2 Likes

I did not suggest changing the algorithm. My point was that if we optimize Julia code, an effort should be made to optimize the other code to the same extent (which, I recognize, is hard to quantify). Clearly, experienced Julia users are now working on this code, so to make a fair comparison, one would need to solicit help from experts in other languages (there are possibly many of them in the Julia community).

One can always argue that throwing @inbounds in front of something is an obvious optimization. But I am not an expert in the other languages, so I donā€™t know what the similar low hanging fruits are.

In that case we should remove the BLAS calls from the C code and leave Python, R, or Octave alone (pure language constructs, no BLAS).

Why not just qualify it? For Julia we do the basics as described here: https://docs.julialang.org/en/stable/manual/performance-tips/ . For Python, do the basics as described for example here: PythonSpeed/PerformanceTips - Python Wiki . Any of the basic ā€œdonā€™t do this in Pythonā€ is pretty much the same (otherwise they wouldnā€™t be basics everyone knows). Itā€™s hard to establish a baseline, but I think this ā€œcode in the style like someone who looked at the first page that pops up on Google for ā€˜xxxx performance tipsā€™ā€ is something that would match ā€œmost decently informed peopleā€™s codeā€ and is essentially what the Julia benchmarks are trying to hit with only a few changes (but of course the algorithms have to be the same, so fib is the recursive algorithm, etc.).

The Computer Language Benchmarks Game is hitting an entirely different audience where bitshift tricks, declaration of non-aliasing scopes, and other bizarre optimizations are fair game. Thatā€™s just completely different.

In any case, we should better clarify what our criteria is and stick to it.

2 Likes

Although much of BLAS/LAPACK has been wrapped into nice high-level Julia functions, not all of the packagesā€™ functions have been, and it is easier to call them in Fortran than to create Julia wrappers. To me, benchmarking Fortran without calling BLAS/LAPACK is unfair. (Iā€™m not saying there arenā€™t any library issues to resolve!)

On the other hand, if weā€™re benchmarking genericity, such as Fortran calling the generic matmul, then why doesnā€™t the Julia benchmark call the generic one too? Performance aside, this is where Julia and its beautiful types have a linguistic advantage.

On how much to optimize, thereā€™s a choice between

  1. Pure language constructs and dead-simple algorithms, e.g. do matrix multiplication with simple for loops, no BLAS anywhere.
  2. Typical usage, what the person with modest familiarity with the language might code with standard tools e,g, BLAS in Fortran & C or anywhere else itā€™s typical and easy with standard distributions of compilers and libraries, @inbounds etc allowed in Julia, @autojit for Python. But donā€™t change fib to a look-up table or recompile Python.

Both are meaningful, and I would like to see results for both. The current micro benchmarks are closer to 1 but not entirely consistent. It would be easier to push toward consistency with 1.

I suspect weā€™re rehashing discussions from the early days of test/perf/micro, and maybe we should either dig those discussions up or let those involved weigh in.

1 Like

The Julia benchmark does call the ā€œgenericā€ one: * ā€¦ itā€™s just that the generic routine dispatches to the fast one. Whether matmul or * call down to a fast BLAS is a library issue. With a decent Fortran system, matmul calls a fast BLAS, and any ā€œofficialā€ posted benchmark numbers should reflect such a configuration.

Frankly, I find this particular benchmark pretty uninteresting ā€” as far as I can tell, its only purpose is to highlight the fact that matrix multiplication via the standard library (if one exists) is basically the same speed in all languages, properly configured, because every standard library can be configured to call the same fast BLAS.

3 Likes

By the way, for previous discussions on this topic, see e.g.:

https://github.com/JuliaLang/julia/pull/1084
https://github.com/JuliaLang/julia/issues/2412
https://github.com/JuliaLang/julia/issues/5128
https://github.com/JuliaLang/julia/issues/13042

(The usual thing that happens if you ask experts from other languages to work on the benchmarks is that they say ā€œno one would write code this way ā€¦ you need to vectorize / call optimized library Xā€, which misses the point of the benchmark.)

3 Likes

Sorry, by generic I meant the appropriate _generic_matmatmul, not the high-level *.

I agree with your sentiment, which is precisely why the Fortran benchmark should call BLAS.

Maybe some of the benchmarks should be renamed to reflect what they test as opposed to what their test case is. E.g. fib ā†’ recursion.

6 Likes

Fortran calling BLAS directly would be fine. We didnā€™t write the Fortran benchmark code; someone contributed it in 2012: https://github.com/JuliaLang/julia/pull/917. We took the implementation at face value as a reasonable one ā€“ I, for one, am not a Fortran programmer. If you think there are improvements that should be made to the Fortran code for a fairer comparison, please do contribute them.

I posted this PR a few days back. It gets test/perf/micro running again on julia-0.7.0. Itā€™s a few Makefile changes for libraries that have moved in the julia source tree since 0.4.0.
https://github.com/JuliaLang/julia/pull/23922

A few things/decisions remain before producing new benchmark data for publication:

  1. Getting Fortran to call BLAS. I tried @Ralph_Smithā€™s suggestion above but got utterly horrific rand_mat_mul results (~100 times slower). I need to double-check that I linked to the correct -lblas, or that there wasnā€™t some problem in the interface with wrong integer size.

  2. Getting R, Python, and Octave to call BLAS for rand_mat_mul. It looks like thatā€™ll require recompiling and relinking these packages (instead of using my non-BLAS default packages on openSUSE). Iā€™m not keen on doing this, and Iā€™m not sure itā€™s fair.

  3. Investigate slowness of mandel on C compared to Java, Javascript, and Lua. Iā€™d welcome help here.

  4. Rename the benchmarks. This idea has come up repeatedly as a way to clarify we are testing recursion, not the optimal fib algorithm, for example. My proposed renaming

old new
fib recursion_fibonacci
quicksort recursion_quicksort
pi_sum iteration_pi_sum
mandel iteration_mandelbrot
parse_int parse_integers
rand_mat_mul matrix_multiply
rand_mat_stat matrix_statistics
printfd print_decimals

Do those look good?

17 Likes

I like the renaming.

2 Likes

I believe LuaJIT was always used (or else you wouldnā€™t have seen that speed with the non-JIT ā€œLuaā€ implementation).

Replacing ā€œLuaJITā€ text with ā€œSciLuaā€ in the PR warranted? Maybe rather say LuaJIT and SciLua on the following line.

I didnā€™t look to much into SciLua, but it seems to ā€œLuaJITā€ language/implementation; with extras similar to Numpy? Does it make any difference, maybe only for BLAS using code, if that?

At some point this plot result was changed to be sorted alphabetically, which I find to be unhelpful for understanding what itā€™s telling me about overall performance. I believe it was originally sorted by the geometric mean of benchmark times, which would be a good ordering, imo, although I think having Julia in the first column on the JuliaLang.org benchmarks is also justifiable.

1 Like

SciLua author here.

First of all, I fully agree on having LuaJIT + SciLua as naming, or LuaJIT using SciLua libraries, instead of just SciLua.
Without LuaJIT none of my work would have been possible :slight_smile:

Some clarifications:

  1. LuaJIT is a JIT implementation of the Lua (5.1 + cherry picked additions from 5.2 and 5.3) language from Mike Pall, version 2.1 brings optimizations and speeds improvement over 2.0
  2. The ā€œsciā€ module is a collection of algorithms for general purpose scientific computing (think GSL in C), relies on OpenBLAS for some matrix operations, but itā€™s otherwise 100% in LuaJIT
  3. the ā€œsciluaā€ executable adds language syntax extensions to LuaJIT in the form of syntactic sugar to facilitate the writing of vector/matrix expressions, requires the ā€œsciā€ module of point 2.

With SciLua I refer to the framework composed by the module and the executable above, both of which use LuaJIT 2.1.

Please feel free to contact me if you have questions.

11 Likes

What makes it so fast?

Maybe there tricks to learn.

The latest benchmarks, approaching completion.

Revisions since last time

  • got BLAS working for Fortran and Octave (thanks, @Ralph_Smith) but not for Python. Not sure if Iā€™ll get Python working with BLAS; this seems fairly involved.

  • renamed the benchmarks. Which is better, print_decimals or print_to_file? The relevant C code is fprintf(f, "%ld %ld\n", i, i); where f is /dev/null.

  • reordered putting Julia next to C and other languages reordered by eye. Itā€™s not straightforward to compute the geometric mean because not all languages implement all benchmarks (e.g. print_to_file is missing for Fortran).

  • investigated the unexpectedly good performance of some iteration_mandelbrot codes and found no obvious flaws. They all seem to be doing the same combinations of loops and computing the same numbers.

  • changed the colormap to a handtooled rainbow since I found some of the default plotting colors hard to distinguish. I has a slightly more vibrant feel than the old colors; let me know if you dislike that.

  • reverted ā€œsciluaā€ back to ā€œluaJIT + sciluaā€ in some places

Iā€™m about ready to call it quits and post a PR for the new benchmark table and plot to julialang.github.com.

19 Likes

Quick suggestion: consider adding faint minor gridlines between the powers of 10. It is not currently visually obvious that the y-axis has a logarithmic scale, but minor gridlines would make this very clear.

EDIT: The colors are great now, but could you fix the spacing issues in the legend before you publish? Also, the x-axis font is kind of ugly (IMHO).

2 Likes