While what you suggest should be done, this highlights a general issue with benchmarks like this: in order to get a fair assessment, we should ask experienced users of all of these languages to optimize their code.
When you benchmark, you have to decide what you are trying to measure.
This particular benchmark is not supposed to test the most optimized possible code for a given problem. e.g. an optimized fib would just use a look-up table and would be totally uninteresting as a benchmark. The point of this benchmark is to test the performance of common language constructs ā e.g. looping, recursion, and even matrix multiplication, all written in the most āordinaryā way (which is why the Fortran one calls the built-in matmul and not dgemm, and similarly Julia calls * and not A_mul_B! or BLAS.dgemm to avoid allocations).
It is also useful to benchmark highly optimized code, of course, but that is a very different exercise and calls for very different sorts of problems.
I did not suggest changing the algorithm. My point was that if we optimize Julia code, an effort should be made to optimize the other code to the same extent (which, I recognize, is hard to quantify). Clearly, experienced Julia users are now working on this code, so to make a fair comparison, one would need to solicit help from experts in other languages (there are possibly many of them in the Julia community).
One can always argue that throwing @inbounds in front of something is an obvious optimization. But I am not an expert in the other languages, so I donāt know what the similar low hanging fruits are.
Why not just qualify it? For Julia we do the basics as described here: https://docs.julialang.org/en/stable/manual/performance-tips/ . For Python, do the basics as described for example here: PythonSpeed/PerformanceTips - Python Wiki . Any of the basic ādonāt do this in Pythonā is pretty much the same (otherwise they wouldnāt be basics everyone knows). Itās hard to establish a baseline, but I think this ācode in the style like someone who looked at the first page that pops up on Google for āxxxx performance tipsāā is something that would match āmost decently informed peopleās codeā and is essentially what the Julia benchmarks are trying to hit with only a few changes (but of course the algorithms have to be the same, so fib is the recursive algorithm, etc.).
The Computer Language Benchmarks Game is hitting an entirely different audience where bitshift tricks, declaration of non-aliasing scopes, and other bizarre optimizations are fair game. Thatās just completely different.
In any case, we should better clarify what our criteria is and stick to it.
Although much of BLAS/LAPACK has been wrapped into nice high-level Julia functions, not all of the packagesā functions have been, and it is easier to call them in Fortran than to create Julia wrappers. To me, benchmarking Fortran without calling BLAS/LAPACK is unfair. (Iām not saying there arenāt any library issues to resolve!)
On the other hand, if weāre benchmarking genericity, such as Fortran calling the generic matmul, then why doesnāt the Julia benchmark call the generic one too? Performance aside, this is where Julia and its beautiful types have a linguistic advantage.
On how much to optimize, thereās a choice between
Pure language constructs and dead-simple algorithms, e.g. do matrix multiplication with simple for loops, no BLAS anywhere.
Typical usage, what the person with modest familiarity with the language might code with standard tools e,g, BLAS in Fortran & C or anywhere else itās typical and easy with standard distributions of compilers and libraries, @inbounds etc allowed in Julia, @autojit for Python. But donāt change fib to a look-up table or recompile Python.
Both are meaningful, and I would like to see results for both. The current micro benchmarks are closer to 1 but not entirely consistent. It would be easier to push toward consistency with 1.
I suspect weāre rehashing discussions from the early days of test/perf/micro, and maybe we should either dig those discussions up or let those involved weigh in.
The Julia benchmark does call the āgenericā one: * ā¦ itās just that the generic routine dispatches to the fast one. Whether matmul or * call down to a fast BLAS is a library issue. With a decent Fortran system, matmul calls a fast BLAS, and any āofficialā posted benchmark numbers should reflect such a configuration.
Frankly, I find this particular benchmark pretty uninteresting ā as far as I can tell, its only purpose is to highlight the fact that matrix multiplication via the standard library (if one exists) is basically the same speed in all languages, properly configured, because every standard library can be configured to call the same fast BLAS.
(The usual thing that happens if you ask experts from other languages to work on the benchmarks is that they say āno one would write code this way ā¦ you need to vectorize / call optimized library Xā, which misses the point of the benchmark.)
Fortran calling BLAS directly would be fine. We didnāt write the Fortran benchmark code; someone contributed it in 2012: https://github.com/JuliaLang/julia/pull/917. We took the implementation at face value as a reasonable one ā I, for one, am not a Fortran programmer. If you think there are improvements that should be made to the Fortran code for a fairer comparison, please do contribute them.
I posted this PR a few days back. It gets test/perf/micro running again on julia-0.7.0. Itās a few Makefile changes for libraries that have moved in the julia source tree since 0.4.0. https://github.com/JuliaLang/julia/pull/23922
A few things/decisions remain before producing new benchmark data for publication:
Getting Fortran to call BLAS. I tried @Ralph_Smithās suggestion above but got utterly horrific rand_mat_mul results (~100 times slower). I need to double-check that I linked to the correct -lblas, or that there wasnāt some problem in the interface with wrong integer size.
Getting R, Python, and Octave to call BLAS for rand_mat_mul. It looks like thatāll require recompiling and relinking these packages (instead of using my non-BLAS default packages on openSUSE). Iām not keen on doing this, and Iām not sure itās fair.
Investigate slowness of mandel on C compared to Java, Javascript, and Lua. Iād welcome help here.
Rename the benchmarks. This idea has come up repeatedly as a way to clarify we are testing recursion, not the optimal fib algorithm, for example. My proposed renaming
I believe LuaJIT was always used (or else you wouldnāt have seen that speed with the non-JIT āLuaā implementation).
Replacing āLuaJITā text with āSciLuaā in the PR warranted? Maybe rather say LuaJIT and SciLua on the following line.
I didnāt look to much into SciLua, but it seems to āLuaJITā language/implementation; with extras similar to Numpy? Does it make any difference, maybe only for BLAS using code, if that?
At some point this plot result was changed to be sorted alphabetically, which I find to be unhelpful for understanding what itās telling me about overall performance. I believe it was originally sorted by the geometric mean of benchmark times, which would be a good ordering, imo, although I think having Julia in the first column on the JuliaLang.org benchmarks is also justifiable.
First of all, I fully agree on having LuaJIT + SciLua as naming, or LuaJIT using SciLua libraries, instead of just SciLua.
Without LuaJIT none of my work would have been possible
Some clarifications:
LuaJIT is a JIT implementation of the Lua (5.1 + cherry picked additions from 5.2 and 5.3) language from Mike Pall, version 2.1 brings optimizations and speeds improvement over 2.0
The āsciā module is a collection of algorithms for general purpose scientific computing (think GSL in C), relies on OpenBLAS for some matrix operations, but itās otherwise 100% in LuaJIT
the āsciluaā executable adds language syntax extensions to LuaJIT in the form of syntactic sugar to facilitate the writing of vector/matrix expressions, requires the āsciā module of point 2.
With SciLua I refer to the framework composed by the module and the executable above, both of which use LuaJIT 2.1.
Please feel free to contact me if you have questions.
got BLAS working for Fortran and Octave (thanks, @Ralph_Smith) but not for Python. Not sure if Iāll get Python working with BLAS; this seems fairly involved.
renamed the benchmarks. Which is better, print_decimals or print_to_file? The relevant C code is fprintf(f, "%ld %ld\n", i, i); where f is /dev/null.
reordered putting Julia next to C and other languages reordered by eye. Itās not straightforward to compute the geometric mean because not all languages implement all benchmarks (e.g. print_to_file is missing for Fortran).
investigated the unexpectedly good performance of some iteration_mandelbrot codes and found no obvious flaws. They all seem to be doing the same combinations of loops and computing the same numbers.
changed the colormap to a handtooled rainbow since I found some of the default plotting colors hard to distinguish. I has a slightly more vibrant feel than the old colors; let me know if you dislike that.
reverted āsciluaā back to āluaJIT + sciluaā in some places
Iām about ready to call it quits and post a PR for the new benchmark table and plot to julialang.github.com.
Quick suggestion: consider adding faint minor gridlines between the powers of 10. It is not currently visually obvious that the y-axis has a logarithmic scale, but minor gridlines would make this very clear.
EDIT: The colors are great now, but could you fix the spacing issues in the legend before you publish? Also, the x-axis font is kind of ugly (IMHO).