Julia programs now shown on benchmarks game website

With regards to using packages, the benchmarks are so simple there isn’t much point to packages imo, and there is value in having completely self contained examples that can be copy pasted.

Of course one could use e.g StaticArrays.jl for the nobody benchmark but showing how one can create their own static vector in a few lines shows off julia more than hiding it behind a package imo.

12 Likes

See also my Julia needs work at "Benchmark Game": numbers seem off up to 100x slower, maybe "Julia AOT" entries needed?

Aren’t the programs in Kristofer’s repo in good shape? I believe some or all of them should be submitted. I guess I could (but should I)? I don’t want to take anyone’s credit.

I decided to look into one of the programs that didn’t have a Julia implementation (I see however now it’s in Kristofer’s repo), and to translate this fastest [C++] non-multi-threaded version:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/spectralnorm-gpp-2.html

It uses Eigen library, and since that’s allowed, I guess we could use similar [Julia] libraries, or even the Julia Eigen wrapper.

However, I just started with translating the commented out “EQUIVALENT MATLAB IMPLEMENTATION”. It’s trivial to do, but I opted to try the MATLAB to Julia translator | MATLAB to Julia converter translator anyway. I ended up with this (maybe it just isn’t “EQUIVALENT”, anyone know what’s wrong?) that’s fast on my very old laptop but wrong:

julia> function approximate(n)
       A = zeros(n,n)
         for i=1:n
           for j=1:n
             A[i,j] = 1.0/((i+j)*(i+j+1)/2.0 + i+1.0)
           end
         end
         u = ones(n,1)
         v = zeros(n,1)
         for i=1:10
           v = A'*(A*u)
           u = A'*(A*v)
         end
         sqrt((u'*v)/(v'*v))
       end
approximate (generic function with 1 method)

julia> @time approximate(5500)
  5.461645 seconds (9.13 k allocations: 232.940 MB, 0.72% gc time)
1×1 Array{Float64,2}:
 0.361967

The translator got this line wrong (anyone know why? and I should file a bug):

sqrt((u"*v)/(v"*v))

and I changed integers to floats here: A[i,j] = 1.0/((i+j)*(i+j+1)/2.0 + i+1.0)

I got the same result as expected and seems same speed (was just checking).

I opened um my MATLAB clone, Octave to just make sure, and I get the same number eventually. Probably after minutes, ten[s?]. At least Julia is way faster, can anyone compare real MATLAB to Julia for me?

You access the matrix in a suboptimal pattern. Try switching the order of the for loops

Thanks, I assumed they had good MATLAB code, and I didn’t look into row vs. column major (I assumed they did differently, though stating “EQUIVALENT”).

Still of course I get the same wrong result.

This however doesn’t change the timing for Julia at least. And for Octave I gave up waiting.

For lower number/quicker I get about 6 to 7 seconds either way for 550 (sometimes the “better” version is faster sometimes the other).

While in Julia I get:

julia> @time approximate(550)
  0.049661 seconds (70 allocations: 2.491 MB)
1×1 Array{Float64,2}:
 0.361967

There is also a “Recursive Fibonacci Benchmark using top languages on Github”.

It would be nice to do it better.

From HN"ok, owner of the repo here. So this project was purely to show the macro differences between interpreted ruby and compiled crystal to beginner ruby devs at a meet up.

I didn’t expect to get a memoized version of every language and really don’t think comparing them from a performance benchmark makes much sense."

fyi The same comparison was removed from the benchmarks game and replaced with tasks that were still toy but more than a dozen lines.

Maybe this can be used for further improvement of the binary-trees benchmark.

It uses a TypedArena allocator and shows it’s fourfold faster than the default allocator.

Probably not - the current solution is 2 times slower than the fastest solution, while the linked version is ~5x slower :wink:

Please don’t implement your own custom “arena” or “memory pool” or “free list” - they will not be accepted.

If such a typed pool was part of the general registry we could use it. The fastest c++ versions use either a boost memory pool or some pool from apache. You won’t beat the @sdanisch version, but you’ll also have something that could be used for something other than filling memory with tree nodes and then deallocating all of them as efficiently as possible.

I took a crack at TypedPool implementation for this. I needed TypedPools for something else I was doing.

the immutable StructPool solution was marginally slower than the @sdanisch solution when freeing the pools in one go, but about 2x slower when deleting the nodes individually.

I see MemoryArena is registered.

Updated the gist with a MemoryArena version that is as close to possible as the current solution ( using threading ). I’m seeing pretty good timings.

env JULIA_NUM_THREADS=4 julia binarytrees-MemoryPool.jl
stretch tree of depth 6 check: 127
32 trees of depth 4 check: 992
long lived tree of depth 5 check: 63
0.198965 seconds (540.90 k allocations: 27.119 MiB)
stretch tree of depth 22 check: 8388607
2097152 trees of depth 4 check: 65011712
524288 trees of depth 6 check: 66584576
131072 trees of depth 8 check: 66977792
32768 trees of depth 10 check: 67076096
8192 trees of depth 12 check: 67100672
2048 trees of depth 14 check: 67106816
512 trees of depth 16 check: 67108352
128 trees of depth 18 check: 67108736
32 trees of depth 20 check: 67108832
long lived tree of depth 21 check: 4194303
2.291580 seconds (106 allocations: 5.297 KiB)

=========

env JULIA_NUM_THREADS=4 julia binarytrees-current.jl
Thread count: 4
stretch tree of depth 6 check: 127
32 trees of depth 4 check: 992
long lived tree of depth 5 check: 63
0.306586 seconds (1.34 M allocations: 67.345 MiB, 4.36% gc time)
stretch tree of depth 22 check: 8388607
2097152 trees of depth 4 check: 65011712
524288 trees of depth 6 check: 66584576
131072 trees of depth 8 check: 66977792
32768 trees of depth 10 check: 67076096
8192 trees of depth 12 check: 67100672
2048 trees of depth 14 check: 67106816
512 trees of depth 16 check: 67108352
128 trees of depth 18 check: 67108736
32 trees of depth 20 check: 67108832
long lived tree of depth 21 check: 4194303
1.923280 seconds (219 allocations: 153.674 MiB, 3.44% gc time)

So we have implementations that are much faster than the ones currently on the benchmark game website. Do we know when it’ll be updated with the faster solutions?

Whenever someone takes the initiative :slight_smile: I meant to kick things off, but higher priority items kept rolling in…
@kristoffer.carlsson might have a plan about it as well… I’m guessing he’s having the same problem as me!
Seems also like the bar is pretty high for updating the code, so I guess everyone is a bit intimidated - or doesn’t want to waste time without going anywhere :wink:

4 Likes

I just updated the spectralnorm benchmarks. It wasn’t that hard. https://salsa.debian.org/benchmarksgame-team/benchmarksgame/issues/90.

However, I then noticed that someone had already updated the spectralnorm with a version that was very similar to the one I contributed! So now we have two benchmarks with very similar running times spectral-norm (Benchmarks Game).

We are of course losing against the compiled languages due to overhead in startup and function compilation.

5 Likes

All “julia” issues ( open + closed )

1 Like

If that’s thought to be a problem, someone authorative can ask for one to be removed.

It was me. For the fasta problem I did not use ccall at the end. Obviously, I was translating from C and ccall one was a little faster. However, I would like to help if I catch something. I think binarytree problem can be improved by using Union{Node, nothing} instead of type unstable version. Fast version in the https://github.com/KristofferC/BenchmarksGame.jl/blob/master/binarytrees/binarytree-fast.jl does not count fair because it implements a pool instead of the directly using julia’s GC.
Ref.

Please don’t implement your own custom “arena” or “memory pool” or “free list” - they will not be accepted

I had add faster binarytree · JuliaPerf/BenchmarksGame.jl@dab054a · GitHub before that one which used the Union.