Julia programs now shown on benchmarks game website

Juan · November 20, 2018, 10:24am

Other languages (such as Java) have several entries on the benchmarks.
We could also have several versions: plain Julia, Julia with some package, Optimized Julia…

Tero_Frondelius · November 20, 2018, 10:38am

I like the idea of using packages. Especially StaticArrays because it is always repeated that packages are not second grade citizens in the julia community.

igouy · November 22, 2018, 12:06am

Also, the vast majority who look at the source code of Julia programs on the benchmarks game website will be unfamiliar with the language — so inline comments which explain language elements particular to Julia can be very helpful.

tkoolen · November 22, 2018, 2:58am

While I think it would be nice to show how easy it is to re-implement the limited functionality needed for these benchmarks without using external packages, there is precedent for using non-standard packages: Java uses it.unimi.dsi.fastutil.longs.Long2IntOpenHashMap in the k-nucleotide benchmark: k-nucleotide Java program (Benchmarks Game).

tkoolen · November 22, 2018, 4:51am

By the way, BioJulia people, step up your game. Java looks beatable for k-nucleotide in terms of both readability and performance, currently way behind in the latter category.

Ronis_BR · November 22, 2018, 2:59pm

This community is amazing! I cloned the repo today to try to improve some tests and I found lots of commits with lots of improvements just 3 days after! Amazing!

@kristoffer.carlsson any idea about what is left to do?

sdanisch · November 22, 2018, 4:29pm

Mandelbrot, binary trees nbodies, fannkuchredux should be in relatively good shape now… If i didn’t miss anything, the rest should still leave quite a bit of room for improvement!

hckr · November 22, 2018, 5:29pm

The bottleneck in fannkuch-redux is the function count_flips which makes lots of array accesses. I guess the fastest implementations use loop-unrolling techniques to utilize out-of-order execution stuff more, specifically on the loop for k = 1:15 whose maximum number of iterations is (made) guaranteed at compile-time. I tried @nexprs but could not get an improvement maybe someone can take a look at there.

igouy · November 22, 2018, 6:02pm

fyi Programs with manual loop unrolling are generally not accepted — that can become a grey area when there are vector operations.

fyi Programs that use fastmath are not accepted.

hckr · November 22, 2018, 6:41pm

What is meant by manual loop unrolling, typing what each iteration should run one-by-one? Is letting a built-in macro (if any) do this kind of optimization accepted as manual loop unrolling, given that it is not done for a specific input?

Some compilers unrolls loops at compile time which I think in the same way such a macro would do and some implementations explicitly or implicitly use this feature.

igouy · November 22, 2018, 7:29pm

Does it show what a clever programmer could do, or does it show what the language implementation does?

hckr · November 22, 2018, 8:33pm

It’s kind of both, isn’t it? Please take a look at this C implementation. As noted in the code, low_index<16 is put there violating the algorithm purposefully to utilize loop-unrolling (since otherwise the compiler cannot unroll a loop that can run an unknown number of times), which is, I would argue, something that a clever programmer could do. Other implementations also make this 16 trick.

I can also argue that an implementation in Julia using macros can show what the language implementation does.

ChrisRackauckas · November 22, 2018, 8:37pm

An implementation in Julia doesn’t need that. It can just use StaticArrays which should do that.

Ronis_BR · November 22, 2018, 8:47pm

Another point to merge StaticArrays into stdlib

Daniel_Berge · November 23, 2018, 3:08am

The fastest C implementation makes use of this trick distance = _mm_cvtps_pd(_mm_rsqrt_ps(_mm_cvtpd_ps(dsquared))) to drop the precision to Float32 so that the SSE instruction for approximate reciprocal square root can be used, which doesn’t exist for Float64 until the AVX-512 extension I gather looking at Intel intrinsics guide.

hckr · November 23, 2018, 9:24am

This is interesting. If julia unrolls loops with StaticArrays, should not it also unroll loops like for i = 1:16? Perhaps, it already is unrolling.

Also, it was said we avoid using packages.

foobar_lv2 · November 23, 2018, 11:25am

Afaik @fastmath won’t emit this; see here if you want to use that trick. I wouldn’t call this idiomatic julia, though.

Tamas_Papp · November 23, 2018, 11:28am

Is this an official rule or the benchmarks game?

hckr · November 23, 2018, 11:42am

Honestly, I have no idea, but people so far avoided it. I referred to this post above.

improbable22 · November 23, 2018, 12:03pm

I don’t think these are unrolled by default, but Unrolled.jl has a macro which does this: Unrolled.@unroll_loop for i=1:16 ... will work, and in some quick tests can be faster. (It’s a one-line macro, btw, so would be trivial to include without having to call the package.)

Edit: then I read the rest of the thread, sorry! Does this count as manual or automated unrolling?

Topic		Replies	Views
Benchmarks game Performance	20	3780	May 13, 2020
Yet another language benchmark Performance benchmark	9	900	June 15, 2025
Help with binary trees benchmark games example General Usage benchmarks	53	3661	May 6, 2021
Benchmark game challenge and some optimization questions Performance	29	2779	January 13, 2024
knucleotide benchmark improvement for Julia and hashing Community announcement	8	843	February 1, 2019

Julia programs now shown on benchmarks game website

Related topics