Julia programs now shown on benchmarks game website

kristoffer.carlsson · November 23, 2018, 2:43pm

With regards to using packages, the benchmarks are so simple there isn’t much point to packages imo, and there is value in having completely self contained examples that can be copy pasted.

Of course one could use e.g StaticArrays.jl for the nobody benchmark but showing how one can create their own static vector in a few lines shows off julia more than hiding it behind a package imo.

Palli · November 23, 2018, 4:51pm

See also my Julia needs work at "Benchmark Game": numbers seem off up to 100x slower, maybe "Julia AOT" entries needed?

Palli · December 6, 2018, 11:30am

Aren’t the programs in Kristofer’s repo in good shape? I believe some or all of them should be submitted. I guess I could (but should I)? I don’t want to take anyone’s credit.

I decided to look into one of the programs that didn’t have a Julia implementation (I see however now it’s in Kristofer’s repo), and to translate this fastest [C++] non-multi-threaded version:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/spectralnorm-gpp-2.html

It uses Eigen library, and since that’s allowed, I guess we could use similar [Julia] libraries, or even the Julia Eigen wrapper.

However, I just started with translating the commented out “EQUIVALENT MATLAB IMPLEMENTATION”. It’s trivial to do, but I opted to try the MATLAB to Julia translator | MATLAB to Julia converter translator anyway. I ended up with this (maybe it just isn’t “EQUIVALENT”, anyone know what’s wrong?) that’s fast on my very old laptop but wrong:

julia> function approximate(n)
       A = zeros(n,n)
         for i=1:n
           for j=1:n
             A[i,j] = 1.0/((i+j)*(i+j+1)/2.0 + i+1.0)
           end
         end
         u = ones(n,1)
         v = zeros(n,1)
         for i=1:10
           v = A'*(A*u)
           u = A'*(A*v)
         end
         sqrt((u'*v)/(v'*v))
       end
approximate (generic function with 1 method)

julia> @time approximate(5500)
  5.461645 seconds (9.13 k allocations: 232.940 MB, 0.72% gc time)
1×1 Array{Float64,2}:
 0.361967

The translator got this line wrong (anyone know why? and I should file a bug):

sqrt((u"*v)/(v"*v))

and I changed integers to floats here: A[i,j] = 1.0/((i+j)*(i+j+1)/2.0 + i+1.0)

I got the same result as expected and seems same speed (was just checking).

I opened um my MATLAB clone, Octave to just make sure, and I get the same number eventually. Probably after minutes, ten[s?]. At least Julia is way faster, can anyone compare real MATLAB to Julia for me?

baggepinnen · December 6, 2018, 12:36pm

You access the matrix in a suboptimal pattern. Try switching the order of the for loops

Palli · December 6, 2018, 4:02pm

Thanks, I assumed they had good MATLAB code, and I didn’t look into row vs. column major (I assumed they did differently, though stating “EQUIVALENT”).

Still of course I get the same wrong result.

This however doesn’t change the timing for Julia at least. And for Octave I gave up waiting.

For lower number/quicker I get about 6 to 7 seconds either way for 550 (sometimes the “better” version is faster sometimes the other).

While in Julia I get:

julia> @time approximate(550)
  0.049661 seconds (70 allocations: 2.491 MB)
1×1 Array{Float64,2}:
 0.361967

Juan · December 12, 2018, 12:56am

There is also a “Recursive Fibonacci Benchmark using top languages on Github”.

It would be nice to do it better.

igouy · December 12, 2018, 9:14pm

From HN — "ok, owner of the repo here. So this project was purely to show the macro differences between interpreted ruby and compiled crystal to beginner ruby devs at a meet up.
…
I didn’t expect to get a memoized version of every language and really don’t think comparing them from a performance benchmark makes much sense."

fyi The same comparison was removed from the benchmarks game and replaced with tasks that were still toy but more than a dozen lines.

Juan · December 18, 2018, 11:46am

Maybe this can be used for further improvement of the binary-trees benchmark.

It uses a TypedArena allocator and shows it’s fourfold faster than the default allocator.

sdanisch · December 18, 2018, 1:26pm

Probably not - the current solution is 2 times slower than the fastest solution, while the linked version is ~5x slower

igouy · December 18, 2018, 4:58pm

Please don’t implement your own custom “arena” or “memory pool” or “free list” - they will not be accepted.

Orbots · December 18, 2018, 10:55pm

If such a typed pool was part of the general registry we could use it. The fastest c++ versions use either a boost memory pool or some pool from apache. You won’t beat the @sdanisch version, but you’ll also have something that could be used for something other than filling memory with tree nodes and then deallocating all of them as efficiently as possible.

Orbots · December 18, 2018, 11:31pm

I took a crack at TypedPool implementation for this. I needed TypedPools for something else I was doing.

the immutable StructPool solution was marginally slower than the @sdanisch solution when freeing the pools in one go, but about 2x slower when deleting the nodes individually.

Orbots · December 19, 2018, 12:35am

I see MemoryArena is registered.

Updated the gist with a MemoryArena version that is as close to possible as the current solution ( using threading ). I’m seeing pretty good timings.

env JULIA_NUM_THREADS=4 julia binarytrees-MemoryPool.jl
stretch tree of depth 6 check: 127
32 trees of depth 4 check: 992
long lived tree of depth 5 check: 63
0.198965 seconds (540.90 k allocations: 27.119 MiB)
stretch tree of depth 22 check: 8388607
2097152 trees of depth 4 check: 65011712
524288 trees of depth 6 check: 66584576
131072 trees of depth 8 check: 66977792
32768 trees of depth 10 check: 67076096
8192 trees of depth 12 check: 67100672
2048 trees of depth 14 check: 67106816
512 trees of depth 16 check: 67108352
128 trees of depth 18 check: 67108736
32 trees of depth 20 check: 67108832
long lived tree of depth 21 check: 4194303
2.291580 seconds (106 allocations: 5.297 KiB)

=========

env JULIA_NUM_THREADS=4 julia binarytrees-current.jl
Thread count: 4
stretch tree of depth 6 check: 127
32 trees of depth 4 check: 992
long lived tree of depth 5 check: 63
0.306586 seconds (1.34 M allocations: 67.345 MiB, 4.36% gc time)
stretch tree of depth 22 check: 8388607
2097152 trees of depth 4 check: 65011712
524288 trees of depth 6 check: 66584576
131072 trees of depth 8 check: 66977792
32768 trees of depth 10 check: 67076096
8192 trees of depth 12 check: 67100672
2048 trees of depth 14 check: 67106816
512 trees of depth 16 check: 67108352
128 trees of depth 18 check: 67108736
32 trees of depth 20 check: 67108832
long lived tree of depth 21 check: 4194303
1.923280 seconds (219 allocations: 153.674 MiB, 3.44% gc time)

Olof_Salberger · January 9, 2019, 9:55am

So we have implementations that are much faster than the ones currently on the benchmark game website. Do we know when it’ll be updated with the faster solutions?

sdanisch · January 9, 2019, 11:08am

Whenever someone takes the initiative I meant to kick things off, but higher priority items kept rolling in…
@kristoffer.carlsson might have a plan about it as well… I’m guessing he’s having the same problem as me!
Seems also like the bar is pretty high for updating the code, so I guess everyone is a bit intimidated - or doesn’t want to waste time without going anywhere

kristoffer.carlsson · January 9, 2019, 12:08pm

I just updated the spectralnorm benchmarks. It wasn’t that hard. https://salsa.debian.org/benchmarksgame-team/benchmarksgame/issues/90.

However, I then noticed that someone had already updated the spectralnorm with a version that was very similar to the one I contributed! So now we have two benchmarks with very similar running times spectral-norm (Benchmarks Game).

We are of course losing against the compiled languages due to overhead in startup and function compilation.

ImreSamu · January 9, 2019, 2:43pm

All “julia” issues ( open + closed )

Issues · The Computer Language Benchmarks Game / benchmarksgame · GitLab

igouy · January 9, 2019, 2:58pm

If that’s thought to be a problem, someone authorative can ask for one to be removed.

tomaklutfu · January 9, 2019, 4:13pm

It was me. For the fasta problem I did not use ccall at the end. Obviously, I was translating from C and ccall one was a little faster. However, I would like to help if I catch something. I think binarytree problem can be improved by using Union{Node, nothing} instead of type unstable version. Fast version in the https://github.com/KristofferC/BenchmarksGame.jl/blob/master/binarytrees/binarytree-fast.jl does not count fair because it implements a pool instead of the directly using julia’s GC.
Ref.

Please don’t implement your own custom “arena” or “memory pool” or “free list” - they will not be accepted

kristoffer.carlsson · January 9, 2019, 4:18pm

I had add faster binarytree · JuliaPerf/BenchmarksGame.jl@dab054a · GitHub before that one which used the Union.

Topic		Replies	Views
Benchmarks game Performance	20	3781	May 13, 2020
Yet another language benchmark Performance benchmark	9	903	June 15, 2025
Help with binary trees benchmark games example General Usage benchmarks	53	3672	May 6, 2021
Benchmark game challenge and some optimization questions Performance	29	2793	January 13, 2024
knucleotide benchmark improvement for Julia and hashing Community announcement	8	847	February 1, 2019

Julia programs now shown on benchmarks game website

Related topics