Julia programs now shown on benchmarks game website

igouy · January 24, 2019, 7:49pm

Hopefully, now that Julia 1.1.0 has landed, some more programs from BenchmarksGame.jl will make it to the benchmarks game website.

ninjaaron · January 25, 2019, 10:52am

Doesn’t look like it.

igouy · January 25, 2019, 6:34pm

I expect the program authors are busy doing other stuff, and will eventually contribute their programs to the benchmarks game.

Olof_Salberger · May 17, 2019, 5:14pm

Benchmarks game website added comparisons to Fortran and Chapel.

I wrote my own benchmark for reverse complement since it is currently the slowest up there, and managed to bring down memory use to competitive with the best from other languages. Haven’t submitted anything though.

github.com

saolof/languageshootout.jl/blob/master/reverse_complement.jl

# Implementation by Olof Salberger.
#
# Mostly written to cut down on memory usage & unnecessary copies 
# without sacrificing readability, while also being an order of magnitude
# speedup over current benchmarks game implementation.

#                                         ABCDEFGHIJKLMNOPQRSTUVWXYZ      abcdefghijklmnopqrstuvwxyz
const complement_hasharr = Vector{UInt8}("TVGH  CD  M KN   YSAABW R       TVGH  CD  M KN   YSAABW R")
complement(charbyte::UInt8)  = @inbounds complement_hasharr[charbyte - 0x40]

function reversemap!(f,v::AbstractVector{UInt8}, s=first(LinearIndices(v)), n=last(LinearIndices(v)))
    r = n
    i = s
    @inbounds while true
        while v[i] < 0x41   # Breaks utility as a generic function,
           i+=1             # but makes it skip non-alphabetic
        end                 # characters without introducing
        while v[r] < 0x41   # unnecessary copies or extra passes
           r-=1             # and without hardcoded line widths.
        end                 #

This file has been truncated. show original

Still not fully optimized and mostly written for readability (hopefully). Feel free to suggest improvements.

Juan · May 18, 2019, 2:06am

Unfortunatelly the benchmarks game site doesn’t update the results.

Olof_Salberger · May 18, 2019, 8:32pm

Made some further changes. Just submitted my benchmark as a PR on their repo. I don’t expect too much from it.

I think that having a decent showing on the benchmarks game is a relatively important thing to do from a PR perspective if we want to tell people that the language is fast. The language has been post-1.0 for almost a year now.

Karajan · May 19, 2019, 4:57pm

Nice!
Your code looks, however, like you are not doing a “read line-by-line”: body = readuntil(instream,UInt8('>'))? This is the part where I broke my neck trying to reach Kristoffers speeds, because you can’t really allocate memory for the whole string until you know how long it’s going to be.

I got down to maybe 40% of the current time, while Kristoffer & Crew are somewhere around 15% on my computer

igouy · May 20, 2019, 7:45pm

Yeah.

Olof_Salberger · May 20, 2019, 8:16pm

Ugh. That’s an absolutely stupid way to read a big block of text into memory, or even to parse a formated stream.

Okay, that can still be made fast with BufferedStreams.jl at the cost of extra memory use. I’ll write a version based on that, and rewrite a basic implementation of InputStreams myself if I get complaints over dependencies. The Golang solution gets to use BufIO. Big question is whether or not you are allowed to use anchors.

foobar_lv2 · May 20, 2019, 8:39pm

I think your code is fine and the point is simply that it must work interactively: Whenever you get sent enough input that there is a newline, you need to process and flush output and are disallowed from waiting (blocking) for more input. It is not permissible to wait until EOF with the processing.

dlakelan · June 3, 2019, 11:34am

FASTA is one line per gene if I remember correctly, so each line here could be say 5-40KB and the whole file could be a couple of gigs, so you can’t just slurp the whole file into RAM, well these days you can but 20 years ago or more when the files were invented you couldn’t.

in any case it’s not reading 80 chars at a time

Karajan · June 5, 2019, 6:48am

You are right, it’t 60 chars at a time (from a 1GB file)

And at least for this benchmark you pretty much have to read it all into memory (look at the memory use), because you have to reverse the whole thing and you can’t really do that before you have read in the end.

Olof_Salberger · August 19, 2019, 11:26pm

Update: over the past few months, it looks like a number of people did some amazing work submitting benchmarks, and Julia now has a very respectable showing, ahead of Swift and Go:

I also think that there’s still quite a bit of additional performance that can be squeezed out. But I think this is a great showing that will help steer more people towards the language.

xiaodai · August 20, 2019, 3:36am

Kind of happy to see Pascal. One of my first languages

Palli · August 20, 2019, 10:09am

https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/fasta-julia-4.html

The best time I get on MY decade old laptop is (i.e.NOT with 4 or 2 threads):

time ~/julia-1.1.0/bin/julia  -O3  -- fasta.jl 25000000 >/dev/null

real	0m6,066s
user	0m4,928s
sys	0m0,232s

While I regularly get under 5 sec (for “user”). I’m unable to get less than 5.1 sec. on julia-1.3.0-alpha, with whatever optimization level or number of threads.

At least I get slower with (for both below) export JULIA_NUM_THREADS=4

time ~/julia-1.1.0/bin/julia  -O2  -- fasta.jl 25000000 >/dev/null

real	0m6,385s
user	0m4,984s
sys	0m0,212s

time ~/julia-1.1.0/bin/julia  -O3  -- fasta.jl 25000000 >/dev/null

real	0m6,479s
user	0m5,008s
sys	0m0,196s

MY best on Julia-1.3.0-alpha (best combination, i.e. NOT -O3, nor more threads faster):

export JULIA_NUM_THREADS=1
time julia -O2  -- fasta.jl 25000000 >/dev/null

real	0m6,372s
user	0m5,212s
sys	0m0,164s

Please consider these settings on your machines, and when submitting benchmarks if LOWER optimization levels (or older Julia versions) is faster; and if disabling threading, or what numbers of is fastest (this may say more about my Core Duo laptop, or threads has startup-overhead?). Best case startup for me is real 0m0,313s" on Julia-1.3.0-alpha, (can easily be “real 0m0,390s”), and 1.1 is just slightly slower at best “real 0m0,330s”, but I’ve seen “real 0m0,958s”.

Could it be that -O0 disables threads? On MY machine in the test below, 1 thread is better for -O3 (and -O0).

I found -O3 to be 46% SLOWER than -O0 on “real” (when both CONFIGURED for 4 threads); 42% slower with best settings for both (2,572s vs. 1,813s for “real” time; even worse on “user” time, then 61% slower 1,992s vs. 1,236s), this was all WHEN I was first testing (for below, not above test) using the shorter test file (not the longer one actually used in the benchmark to make it long-running, still useful to know how it affects speed):

export JULIA_NUM_THREADS=1
time ~/julia-1.3.0-alpha/bin/julia -O0  -- kn.jl 0 < ~/Downloads/knucleotide-input.txt

real	0m1,813s
user	0m1,296s
sys	0m0,204s

also got:

real	0m2,047s
user	0m1,236s
sys	0m0,272s

export JULIA_NUM_THREADS=4
time ~/julia-1.3.0-alpha/bin/julia -O3  -- kn.jl 0 < ~/Downloads/knucleotide-input.txt

real	0m2,887s
user	0m2,144s
sys	0m0,188s

export JULIA_NUM_THREADS=4
time ~/julia-1.3.0-alpha/bin/julia -O0  -- kn.jl 0 < ~/Downloads/knucleotide-input.txt

real	0m2,010s
user	0m1,392s
sys	0m0,184s

export JULIA_NUM_THREADS=1
time ~/julia-1.1.0/bin/julia -O3  -- kn.jl 0 < ~/Downloads/knucleotide-input.txt

real	0m2,572s
user	0m2,048s
sys	0m0,180s

[As with here, I’m always going for lowest “user” and have seen lower “real”, but then “user” higher.]

time ~/julia-1.3.0-alpha/bin/julia -O3  -- kn.jl 0 < ~/Downloads/knucleotide-input.txt

real	0m2,829s
user	0m1,992s
sys	0m0,196s

time julia --compile=min  -- kn.jl 0 < ~/Downloads/knucleotide-input.txt

real	0m6,511s
user	0m4,820s
sys	0m0,232s

Karajan · August 20, 2019, 4:18pm

Update: over the past few months, it looks like a number of people did some amazing work submitting benchmarks, and Julia now has a very respectable showing, ahead of Swift and Go:

Yes, a bunch of the work has been done, mostly by the amazing @non-Jedi (currently 4/10 top Julia programs).

I also think that there’s still quite a bit of additional performance that can be squeezed out.

Quite possibly, yes.

pidigits: all GMP calls anyways, so not too much hope here
revcomp: I’m currently trying to get a buffered version to be accepted. With 1.3 I’ll try to do some multithreading.
fasta: 1.3 mulitthreading will help for sure
nbody: … Adam is currently fighting with this one, maybe some SIMD wizards can help out. Not sure how Rust manages to be this much faster on pure number crunching.
knuc: maybe multithreading helps, maybe some more hacks regarding the hash function… not quite sure.
binarytrees: because Julia provides GC, there is not much that can be done here, I think
With the other ones I’m not sure because I haven’t tried them. Many of the are quite fast already though.

non-Jedi · August 20, 2019, 4:48pm

I do wonder why this one runs so much slower multi-threaded than it does with multiple processes. Is there something to do with heap-allocating and garbage collection that’s inherently not amenable to multi-threaded environments? If we could use multi-threading instead of multiple processes, that would take a rather large chunk off the run time.

To expand on the list off the top of my head:

regex-redux: could become faster once 1.3 lands with new threading runtime (and thread-safe regex) since execution time is dominated by a strictly non-parallelizable task that could be started on a single thread ahead of other work.
mandelbrot: isn’t currently using all cpu cores effectively compared to other implementations. I haven’t identified why yet.
revcomp: in addition to buffered read @Karajan is working on, this could also benefit from new threading runtime to start working on a specific sequence while still reading input. It also may be possible to speedup the reversal of each sequence using multi-threading if you divide it into “chunks” instead of just using a naive Threads.@threads looping over the array; this requires removing new-lines from the array being reversed–slightly different than @Karajan current fastest implementation.
knuc: julia’s hashmap in general seems slower than some other implementations; not sure why. There’s an opportunity for better usage of cpu cores by implementing parallelism within the counting of each frame instead of around it.
fasta: obvious opportunity to parallelize, but I haven’t taken the time to grok what the benchmark is actually doing yet.

jebej · August 20, 2019, 5:03pm

The one I don’t understand is nbody: it takes 3.2 sec on my computer to run, and on the website is says that the benchmark takes 22 sec. I don’t think my computer should be that much faster…

EDIT: The code here: https://github.com/KristofferC/BenchmarksGame.jl/blob/master/nbody/nbody-fast.jl is even faster (and simpler to understand) at 2.8sec.

Palli · August 20, 2019, 5:07pm

Which benchmark? You probably made the same mistake I did, running one with a shorter test file. As I did for the second benchmark in my comment above.

jebej · August 20, 2019, 5:17pm

Here: n-body Julia #3 program (Benchmarks Game)

I get the same output so I think I’m running the right thing.

Topic		Replies	Views
Benchmarks game Performance	20	3780	May 13, 2020
Yet another language benchmark Performance benchmark	9	900	June 15, 2025
Help with binary trees benchmark games example General Usage benchmarks	53	3664	May 6, 2021
Benchmark game challenge and some optimization questions Performance	29	2786	January 13, 2024
knucleotide benchmark improvement for Julia and hashing Community announcement	8	846	February 1, 2019

Julia programs now shown on benchmarks game website

Related topics