Julia faster on Mac OSX than in linux?

v-i-s-h · March 20, 2018, 4:00pm

I use julia for my research. My lab has both iMac (i5@2.7GHz,8GB,High Seirra) and Hp Desktop machine (i7@3.6GHz,16GB, with elementaryOS). I ran a speed test using peakflops() in both the machines and the results are below

# For Linux desktop
julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Prescott)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

julia> peakflops()
3.588384754696235e10

julia> peakflops()
3.393129118818704e10

julia> peakflops(parallel=true)
3.667023267872277e10

julia> peakflops(parallel=true)
3.668789077963552e10

# For iMac
julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i5-4570R CPU @ 2.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

julia> peakflops()
1.3308120765839432e11

julia> peakflops()
9.118221999595766e10

julia> peakflops(parallel=true)
1.1292449978046065e11

julia> peakflops(parallel=true)
1.2288056560449782e11

Even though the linux machine has a better hardware, the speed of julia in iMac is higher.
Any comments?

Balinus · March 20, 2018, 4:16pm

peakflops() seems rather unstable in terms of return values.

peak_func

using Plots
function benchpeak()
    niter = 500
    T = fill(NaN, niter)
    for n = 1:niter
        T[n] = peakflops();
    end
    return T
end
T =  benchpeak()
histogram(T)

In other words, try with more iterations to see if your results still holds.

ExpandingMan · March 20, 2018, 4:33pm

Besides that, it almost has to be the case that peakflops() just calls some POSIX binding or something, doesn’t it? If that’s indeed the case than the return value doesn’t have anything to do with Julia in particular.

Seif_Shebl · March 20, 2018, 5:05pm

You are testing on two different computers, so it is not surprising that one may be more performant than another. Also, specs tell you little about the real life experience.

kristoffer.carlsson · March 20, 2018, 5:57pm

"""
    peakflops(n::Integer=2000; parallel::Bool=false)
`peakflops` computes the peak flop rate of the computer by using double precision
[`gemm!`](@ref LinearAlgebra.BLAS.gemm!). By default, if no arguments are specified, it
multiplies a matrix of size `n x n`, where `n = 2000`. If the underlying BLAS is using
multiple threads, higher flop rates are realized. The number of BLAS threads can be set with
[`BLAS.set_num_threads(n)`](@ref).
If the keyword argument `parallel` is set to `true`, `peakflops` is run in parallel on all
the worker processors. The flop rate of the entire parallel computer is returned. When
running in parallel, only 1 BLAS thread is used. The argument `n` still refers to the size
of the problem that is solved on each processor.
"""

mpastell · March 20, 2018, 7:04pm

Try using bigger n e.g. peakflops(10_000) , I get 1.6e11 on Linux with core i7-4770 and your processor should be a bit faster.

Elrod · March 20, 2018, 8:39pm

Seems like it has to warm up a little:

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
  WORD_SIZE: 64
  BLAS: libmkl_rt
  LAPACK: libmkl_rt
  LIBM: libimf
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

julia> peakflops()
7.868601282943965e10

julia> peakflops()
1.1988692385229176e11

julia> peakflops(10_000)
2.057506433249601e11

julia> peakflops()
1.8486195064111972e11

edit: OpenBLAS:

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

julia> peakflops()
1.5887918519922025e11

julia> peakflops(10_000)
1.6348315789326385e11

julia> peakflops()
1.6243814292065182e11

Your (v-i-s-h) CPU is newer than mine; I’d expect it to be faster.

samoconnor · March 20, 2018, 11:43pm

… one may be more performant than another …

A bit off topic:
I’ve been seeing this word “performant” a bit recently. What is it intended to mean?
Is it different to “one may be faster than another” ? or “one may be more efficient than another”?

ScottPJones · March 21, 2018, 12:02am

Adjective. performant (comparative more performant, superlative most performant) (jargon, chiefly computing) Capable of or characterized by an adequate or excellent level of performance or efficiency.

samoconnor · March 21, 2018, 12:18am

Interesting, thank’s for indulging my off-topic side track.

Is there some context where saying “more performant” is clearer than saying “more efficient” or “faster”? … or in the context above just “one may perform better than another”?

I sound like I remember my grandmother sounding when she used to pick on my grammar, I must be getting old

Sometimes it’s important to learn the new technical jargon. Sometimes the new technical jargon is only there to make things seem more magical and complex than they really are.

ScottPJones · March 21, 2018, 12:21am

Well, not that I can think of, and I’ve kind of caught the habit of using the term myself over the last few years!

On second thought, one could say that something is more performant, if it uses fewer resources, to get the same result in a similar time, I think, and “faster” would not do, in that case - so maybe more of a combination of faster and/or efficient, maybe implying better scalability.

Jargon, in other words!

pasha · March 21, 2018, 12:25am

I can’t think of an aspect of performance that people are referring to other than speed of execution. Allocations? Heat? DB access? Qubit flip rate? I don’t think they are being that precise. I expect it’s a plot to make old programmers look uneducated and out of touch.

samoconnor · March 21, 2018, 12:33am

Allocations? Heat? DB access? Qubit flip rate?

I can’t imagine any non-speed parameter that wouldn’t be covered by “efficient” (the relationship between the consumed things you care about and the produced things you care about).

ScottPJones · March 21, 2018, 12:36am

Haha, being one of those old programmers myself!
I think the real reason is simply the penchant in English for shortening things as much as possible,
“isn’t performant” is shorter/faster (more performant? ) than saying “doesn’t perform very well”, and
“is performant” than “performs adequately”.

Ralph_Smith · March 21, 2018, 2:18am

(back on topic)

This probably means that OpenBLAS does not correctly identify your new processor, so it is not using the best SIMD instructions. (Prescott is the fallback architecture.)

You could build Julia to use MKL for BLAS, or you can try to force OpenBLAS to use the Haswell target. (See the README at the main Julia repo for instructions.)

v-i-s-h · March 21, 2018, 8:44am

May be Ralph’s comment is the key. Julia faster on Mac OSX than in linux? - #15 by Ralph_Smith. I should check this!

v-i-s-h · March 21, 2018, 8:45am

Yeah… I’ll check this.

v-i-s-h · March 21, 2018, 9:07am

I tried to install Julia Pro. My desktop is running elementaryOS. For those who are new to elementaryOS, it is derived from ubuntu and anything working in ubuntu should work in elementaryOS also. But I’m getting this error when trying to install:

vish@eliteone:~/Downloads$ ./JuliaPro-0.6.2.2_mkl_build-17.sh /opt/julialang/pro/
JuliaPro installation has started, please wait until all the files are extracted

OS Detected: elementary OS 0.4.1 Loki
Unsupported Linux Distribution: elementary

v-i-s-h · March 22, 2018, 10:46am

Thanks Ralph, This worked. After forcing to build openblas with HASWEL target, I have

vish@eliteone:/opt/julialang/src/julia$ ./julia 
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.2 (2017-12-13 18:08 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-linux-gnu

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40* (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT NO_AFFINITY HASWELL)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

julia> peakflops()
1.7415419592915222e11

julia> peakflops()
1.2685746785691425e11

julia> peakflops()
1.9630557568812375e11

But now I’m more curious. Why official julia build went to Prescott arch?

Ralph_Smith · March 22, 2018, 1:11pm

The OpenBLAS shipped with binary Julia distributions has a “dynamic” target which means that it picks an instruction set based on a runtime CPU check. Unfortunately the OpenBLAS team had not yet added checks for the newest models when the version used for Julia distribution was packaged. (I don’t know if this is corrected for your model in newer OpenBLAS.)

Topic		Replies	Views
Show off Julia performance on your PC! Performance	53	4328	April 26, 2020
Why would a fast computer run Julia slower than a slow computer? General Usage	3	1722	September 20, 2017
Any benchmark of Julia v1.0 vs older versions Performance	66	8131	April 3, 2019
OpenBLAS is faster than Intel MKL on AMD Hardware (Ryzen) Performance blas , lapack	40	36486	June 19, 2020
What can cause significantly different performance for pisum microbenchmark on different workstations Performance	11	1006	May 12, 2019

Julia faster on Mac OSX than in linux?

Related topics