Julia 1.6.0-rc2 (and 1.7 and rc1) with threads is slower - considerations for the Benchmark Game

First the good stuff, the release candidate takes one sec, 1/5 of the time off when threading isn’t used.

But while new and old are both faster as expected with threading, the gain is less on the release candidate making it slower in that case at least:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/fasta-julia-7.html

$ hyperfine '~/julia-1.6.0-rc1/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > input25000000.txt'
Benchmark #1: ~/julia-1.6.0-rc1/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > input25000000.txt
  Time (mean ± σ):      3.052 s ±  0.133 s    [User: 5.675 s, System: 1.187 s]
  Range (min … max):    2.897 s …  3.321 s    10 runs
 
$ hyperfine '~/julia-1.5.1/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > input25000000.txt'
Benchmark #1: ~/julia-1.5.1/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > input25000000.txt
  Time (mean ± σ):      2.807 s ±  0.070 s    [User: 6.691 s, System: 1.003 s]
  Range (min … max):    2.743 s …  2.972 s    10 runs

 
$ hyperfine '~/julia-1.6.0-rc1/bin/julia --startup-file=no -O3 fasta.jl 25000000 > input25000000.txt'
Benchmark #1: ~/julia-1.6.0-rc1/bin/julia --startup-file=no -O3 fasta.jl 25000000 > input25000000.txt
  Time (mean ± σ):      4.099 s ±  0.043 s    [User: 3.655 s, System: 0.920 s]
  Range (min … max):    4.044 s …  4.167 s    10 runs

$ hyperfine '~/julia-1.5.1/bin/julia --startup-file=no -O3 fasta.jl 25000000 > input25000000.txt'
Benchmark #1: ~/julia-1.5.1/bin/julia --startup-file=no -O3 fasta.jl 25000000 > input25000000.txt
  Time (mean ± σ):      5.204 s ±  0.135 s    [User: 4.812 s, System: 0.883 s]
  Range (min … max):    4.975 s …  5.411 s    10 runs

Intriguingly the two fastest fasta programs, Rust and C++, are parallel, but if I read the stats right then kind of only use two cores, still over twice as fast as Julia that uses all four cores more evenly.

It would be good to speed up some of the other code there (more effectively); some of the competition is parallel, where Julia code isn’t. E.g.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/pidigits-julia-3.html

and it’s a bit slower than PHP that neither is parallel (only the fastest program is).

binary-trees is 9.3x slower than best program and may need to be rewritten, and also my timing shows bad:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/binarytrees-julia-4.html

$ hyperfine '~/julia-1.6.0-rc1/bin/julia --startup-file=no -p4 -O3 bt.jl 21'
Benchmark #1: ~/julia-1.6.0-rc1/bin/julia --startup-file=no -p4 -O3 bt.jl 21
 ⠼ Current estimate: 19.621 s 

$ hyperfine '~/julia-1.5.1/bin/julia --startup-file=no -p4 -O3 bt.jl 21'
Benchmark #1: ~/julia-1.5.1/bin/julia --startup-file=no -p4 -O3 bt.jl 21
 ⠇ Current estimate: 12.856 s

Can anyone confirm? Maybe it’s because my machine is too loaded, but that’s also a bad sign, if it hinders Julia.

Seems very off with -p4 simply making slower (unlike on 1.5.1):

$ hyperfine '~/julia-1.6.0-rc1/bin/julia --startup-file=no -O3 bt.jl 21'
Benchmark #1: ~/julia-1.6.0-rc1/bin/julia --startup-file=no -O3 bt.jl 21
 ⠼ Current estimate: 12.845 s
```
1 Like

I filed an issue Two types of parallelism slower in Julia 1.6.0-rc1 · Issue #39598 · JuliaLang/julia · GitHub

1 Like