Julia programs now shown on benchmarks game website

Other languages (such as Java) have several entries on the benchmarks.
We could also have several versions: plain Julia, Julia with some package, Optimized Julia…

3 Likes

I like the idea of using packages. Especially StaticArrays because it is always repeated that packages are not second grade citizens in the julia community.

7 Likes

Also, the vast majority who look at the source code of Julia programs on the benchmarks game website will be unfamiliar with the language — so inline comments which explain language elements particular to Julia can be very helpful.

3 Likes

While I think it would be nice to show how easy it is to re-implement the limited functionality needed for these benchmarks without using external packages, there is precedent for using non-standard packages: Java uses it.unimi.dsi.fastutil.longs.Long2IntOpenHashMap in the k-nucleotide benchmark: k-nucleotide Java program (Benchmarks Game).

2 Likes

By the way, BioJulia people, step up your game. Java looks beatable for k-nucleotide in terms of both readability and performance, currently way behind in the latter category. :slight_smile:

5 Likes

This community is amazing! I cloned the repo today to try to improve some tests and I found lots of commits with lots of improvements just 3 days after! Amazing!

@kristoffer.carlsson any idea about what is left to do?

1 Like

Mandelbrot, binary trees nbodies, fannkuchredux should be in relatively good shape now… If i didn’t miss anything, the rest should still leave quite a bit of room for improvement!

6 Likes

The bottleneck in fannkuch-redux is the function count_flips which makes lots of array accesses. I guess the fastest implementations use loop-unrolling techniques to utilize out-of-order execution stuff more, specifically on the loop for k = 1:15 whose maximum number of iterations is (made) guaranteed at compile-time. I tried @nexprs but could not get an improvement maybe someone can take a look at there.

fyi Programs with manual loop unrolling are generally not accepted — that can become a grey area when there are vector operations.

fyi Programs that use fastmath are not accepted.

1 Like

What is meant by manual loop unrolling, typing what each iteration should run one-by-one? Is letting a built-in macro (if any) do this kind of optimization accepted as manual loop unrolling, given that it is not done for a specific input?

Some compilers unrolls loops at compile time which I think in the same way such a macro would do and some implementations explicitly or implicitly use this feature.

Does it show what a clever programmer could do, or does it show what the language implementation does?

It’s kind of both, isn’t it? Please take a look at this C implementation. As noted in the code, low_index<16 is put there violating the algorithm purposefully to utilize loop-unrolling (since otherwise the compiler cannot unroll a loop that can run an unknown number of times), which is, I would argue, something that a clever programmer could do. Other implementations also make this 16 trick.

I can also argue that an implementation in Julia using macros can show what the language implementation does.

1 Like

An implementation in Julia doesn’t need that. It can just use StaticArrays which should do that.

3 Likes

Another point to merge StaticArrays into stdlib :slight_smile:

13 Likes

The fastest C implementation makes use of this trick distance = _mm_cvtps_pd(_mm_rsqrt_ps(_mm_cvtpd_ps(dsquared))) to drop the precision to Float32 so that the SSE instruction for approximate reciprocal square root can be used, which doesn’t exist for Float64 until the AVX-512 extension I gather looking at Intel intrinsics guide.

1 Like

This is interesting. If julia unrolls loops with StaticArrays, should not it also unroll loops like for i = 1:16? Perhaps, it already is unrolling.

Also, it was said we avoid using packages.

Afaik @fastmath won’t emit this; see here if you want to use that trick. I wouldn’t call this idiomatic julia, though.

Is this an official rule or the benchmarks game?

Honestly, I have no idea, but people so far avoided it. I referred to this post above.

I don’t think these are unrolled by default, but Unrolled.jl has a macro which does this: Unrolled.@unroll_loop for i=1:16 ... will work, and in some quick tests can be faster. (It’s a one-line macro, btw, so would be trivial to include without having to call the package.)

Edit: then I read the rest of the thread, sorry! Does this count as manual or automated unrolling?