Julia programs (currently just those from BaseBenchmarks.jl) are now shown on the benchmarks game website.
Everyone is welcome to contribute improved programs to the project, following these instructions —
Julia programs (currently just those from BaseBenchmarks.jl) are now shown on the benchmarks game website.
Everyone is welcome to contribute improved programs to the project, following these instructions —
That’s really cool. But what’s up with the k-Nucleotide problem k-nucleotide (Benchmarks Game) ? Julia is much slower than python3 here. I didn’t investigate further it just struck me as odd.
I’m surprised Julia is getting smoked so badly by Java in so many benchmarks.
I remember people saying that they tried not to squeeze out all the performance they could from Julia in the benchmarks repository so they could compare naïve idiomatic Julia code to naïve idiomatic code in other languages. Is this what’s going on here?
Julia does not make use of multithreading/multiprocessing while implementations in other languages do. Is there a specific reason for that?
K-nucleotide:
function count(data::AbstractString, n::Int)
counts = Dict{AbstractString, Int}()
...
Abstractly typed container: Big no, almost never. Should be
function count(data::T, n::Int) where {T<:AbstractString}
counts = Dict{T, Int}()
...
Binary tree:
abstract type BTree end
mutable struct Empty <: BTree
end
mutable struct Node <: BTree
left::BTree
right::BTree
end
We know the type hierarchy at compile time:
mutable struct Node
left::Union{Nothing, Node}
right::Union{Nothing, Node}
end
FWIW, there hasn’t been much time making these fast, they have just been used to regress test the performance of julia itself.
The timings for Julia are totally unfair, no optimization options are passed or bounds-checking switches whereas other languages enable all possible optimizations. As an example, the nbody program on my computer times:
4.44 sec : Julia (9% off the fastest language)
4.05 sec : ifort
on the website, that’s another story. Look at this:
MAKE:
/opt/src/intel/bin/ifort -O3 -ipo -static -xHost nbody.ifc-6.f90 -o nbody.ifc-6.ifc_run
rm nbody.ifc-6.f90
vs.
COMMAND LINE:
/opt/src/julia-1.0.2/bin/julia -- nbody.julia 50000000
Can we fix this?
Knowing Julia community, these ranking might change soon
The Java implementations of these benchmarks have been optimized to death. The Julia versions are mostly naive high-level versions. There’s a lot of room for improvement there.
While java programs have been looked over more often, we shouldn’t forget the constraints of each individual program - they’ve all got certain restrictions, e.g. not implementing a strictly better algorithm.
Admittedly, I’ve only looked at the Julia code and not the java code - but I think Julia can still improve by a large amount
structs in Binary tree don’t need to be mutable. Just changing that improves perf by 30%. Read only data typically leads to better cache performance.
I started something here if people are interested in contributing:
Not everything works yet.
Of course.
Also, more than one program can be shown for the same task, so — a simple Julia program, a low-memory Julia program, a fastest elapsed time Julia program, a low cpu time Julia program.
For example, I think the Chapel programs are aimed at showing how well simple “readable” Chapel programs perform, rather than doing low-level stuff.
Yes, I think that the Julia versions should aim for 1-2x C with decent readability. That’s always been our sweet spot; eeking out the last drop of performance at the cost of readability is rarely worth it.
In my case, I managed to have a SGP4 propagator with similar performance of that of FORTRAN 77 (10% slower) with 10000000x more readability
If we want to improve the performance, should we PR to your repo? Moreover, can we use packages like StaticArrays?
I think we should avoid using packages. PRs accepted. The testing and benchmarking script needs love as well. I’ll get to it in a while though if no one else beats me to it
I think I can play with this in my spare time. Furthermore, this seems a very good reason to merge StaticArrays into core language don’t you think
Another way is to represent collections of binary trees as matrices, my package has all the operations needed to operate on collections of binary trees as matrix operations, nested objects / fields are not required for trees.
Not that it should be used or the test, just saying that matrices can be used instead of nested objects, and I can do operations on entire collections of binary trees at once with a single matrix operation.
Each benchmark has a description for the game. The binary tree game is intended to be stress test of the default garbage collector for a language.
Lucky for C/C++ they don’t have a default GC, so they get to pick the fastest memory pool they can find. Which would violate the rules for any other language.
Anyways, with multithreading and a few performance tweaks the Julia implementation should be in the ballpark of a C/C++ implementation that uses malloc/free