Julia programs now shown on benchmarks game website

The timings for Julia are totally unfair, no optimization options are passed or bounds-checking switches whereas other languages enable all possible optimizations. As an example, the nbody program on my computer times:

4.44 sec : Julia (9% off the fastest language)
4.05 sec : ifort

on the website, that’s another story. Look at this:

/opt/src/intel/bin/ifort -O3 -ipo -static -xHost nbody.ifc-6.f90 -o nbody.ifc-6.ifc_run
rm nbody.ifc-6.f90


/opt/src/julia-1.0.2/bin/julia -- nbody.julia 50000000

Can we fix this?

1 Like

Knowing Julia community, these ranking might change soon :wink:


The Java implementations of these benchmarks have been optimized to death. The Julia versions are mostly naive high-level versions. There’s a lot of room for improvement there.

1 Like

While java programs have been looked over more often, we shouldn’t forget the constraints of each individual program - they’ve all got certain restrictions, e.g. not implementing a strictly better algorithm.

Admittedly, I’ve only looked at the Julia code and not the java code - but I think Julia can still improve by a large amount :joy:

structs in Binary tree don’t need to be mutable. Just changing that improves perf by 30%. Read only data typically leads to better cache performance.

1 Like

I started something here if people are interested in contributing:

Not everything works yet.


Of course.

Also, more than one program can be shown for the same task, so — a simple Julia program, a low-memory Julia program, a fastest elapsed time Julia program, a low cpu time Julia program.

For example, I think the Chapel programs are aimed at showing how well simple “readable” Chapel programs perform, rather than doing low-level stuff.

Yes, I think that the Julia versions should aim for 1-2x C with decent readability. That’s always been our sweet spot; eeking out the last drop of performance at the cost of readability is rarely worth it.


In my case, I managed to have a SGP4 propagator with similar performance of that of FORTRAN 77 (10% slower) with 10000000x more readability :sweat_smile:


If we want to improve the performance, should we PR to your repo? Moreover, can we use packages like StaticArrays?

I think we should avoid using packages. PRs accepted. The testing and benchmarking script needs love as well. I’ll get to it in a while though if no one else beats me to it :slight_smile:

1 Like

I think I can play with this in my spare time. Furthermore, this seems a very good reason to merge StaticArrays into core language don’t you think :sweat_smile:


Another way is to represent collections of binary trees as matrices, my package has all the operations needed to operate on collections of binary trees as matrix operations, nested objects / fields are not required for trees.

Not that it should be used or the test, just saying that matrices can be used instead of nested objects, and I can do operations on entire collections of binary trees at once with a single matrix operation.


Each benchmark has a description for the game. The binary tree game is intended to be stress test of the default garbage collector for a language.
Lucky for C/C++ they don’t have a default GC, so they get to pick the fastest memory pool they can find. Which would violate the rules for any other language.
Anyways, with multithreading and a few performance tweaks the Julia implementation should be in the ballpark of a C/C++ implementation that uses malloc/free



Other languages (such as Java) have several entries on the benchmarks.
We could also have several versions: plain Julia, Julia with some package, Optimized Julia…


I like the idea of using packages. Especially StaticArrays because it is always repeated that packages are not second grade citizens in the julia community.


Also, the vast majority who look at the source code of Julia programs on the benchmarks game website will be unfamiliar with the language — so inline comments which explain language elements particular to Julia can be very helpful.


While I think it would be nice to show how easy it is to re-implement the limited functionality needed for these benchmarks without using external packages, there is precedent for using non-standard packages: Java uses it.unimi.dsi.fastutil.longs.Long2IntOpenHashMap in the k-nucleotide benchmark: https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/knucleotide-java-1.html.


By the way, BioJulia people, step up your game. Java looks beatable for k-nucleotide in terms of both readability and performance, currently way behind in the latter category. :slight_smile:


This community is amazing! I cloned the repo today to try to improve some tests and I found lots of commits with lots of improvements just 3 days after! Amazing!

@kristoffer.carlsson any idea about what is left to do?

1 Like