Julia faster on Mac OSX than in linux?

I use julia for my research. My lab has both iMac (i5@2.7GHz,8GB,High Seirra) and Hp Desktop machine (i7@3.6GHz,16GB, with elementaryOS). I ran a speed test using peakflops() in both the machines and the results are below

# For Linux desktop
julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Prescott)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

julia> peakflops()
3.588384754696235e10

julia> peakflops()
3.393129118818704e10

julia> peakflops(parallel=true)
3.667023267872277e10

julia> peakflops(parallel=true)
3.668789077963552e10
# For iMac
julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i5-4570R CPU @ 2.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

julia> peakflops()
1.3308120765839432e11

julia> peakflops()
9.118221999595766e10

julia> peakflops(parallel=true)
1.1292449978046065e11

julia> peakflops(parallel=true)
1.2288056560449782e11

Even though the linux machine has a better hardware, the speed of julia in iMac is higher.
Any comments?

peakflops() seems rather unstable in terms of return values.

peak_func

using Plots
function benchpeak()
    niter = 500
    T = fill(NaN, niter)
    for n = 1:niter
        T[n] = peakflops();
    end
    return T
end
T =  benchpeak()
histogram(T)

In other words, try with more iterations to see if your results still holds.

Besides that, it almost has to be the case that peakflops() just calls some POSIX binding or something, doesn’t it? If that’s indeed the case than the return value doesn’t have anything to do with Julia in particular.

You are testing on two different computers, so it is not surprising that one may be more performant than another. Also, specs tell you little about the real life experience.

"""
    peakflops(n::Integer=2000; parallel::Bool=false)
`peakflops` computes the peak flop rate of the computer by using double precision
[`gemm!`](@ref LinearAlgebra.BLAS.gemm!). By default, if no arguments are specified, it
multiplies a matrix of size `n x n`, where `n = 2000`. If the underlying BLAS is using
multiple threads, higher flop rates are realized. The number of BLAS threads can be set with
[`BLAS.set_num_threads(n)`](@ref).
If the keyword argument `parallel` is set to `true`, `peakflops` is run in parallel on all
the worker processors. The flop rate of the entire parallel computer is returned. When
running in parallel, only 1 BLAS thread is used. The argument `n` still refers to the size
of the problem that is solved on each processor.
"""
1 Like

Try using bigger n e.g. peakflops(10_000) , I get 1.6e11 on Linux with core i7-4770 and your processor should be a bit faster.

Seems like it has to warm up a little:

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
  WORD_SIZE: 64
  BLAS: libmkl_rt
  LAPACK: libmkl_rt
  LIBM: libimf
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

julia> peakflops()
7.868601282943965e10

julia> peakflops()
1.1988692385229176e11

julia> peakflops(10_000)
2.057506433249601e11

julia> peakflops()
1.8486195064111972e11

edit: OpenBLAS:

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

julia> peakflops()
1.5887918519922025e11

julia> peakflops(10_000)
1.6348315789326385e11

julia> peakflops()
1.6243814292065182e11

Your (v-i-s-h) CPU is newer than mine; I’d expect it to be faster.

… one may be more performant than another …

A bit off topic:
I’ve been seeing this word “performant” a bit recently. What is it intended to mean?
Is it different to “one may be faster than another” ? or “one may be more efficient than another”?

Adjective. performant (comparative more performant, superlative most performant) (jargon, chiefly computing) Capable of or characterized by an adequate or excellent level of performance or efficiency.

Interesting, thank’s for indulging my off-topic side track.

Is there some context where saying “more performant” is clearer than saying “more efficient” or “faster”? … or in the context above just “one may perform better than another”?

I sound like I remember my grandmother sounding when she used to pick on my grammar, I must be getting old :slight_smile:

Sometimes it’s important to learn the new technical jargon. Sometimes the new technical jargon is only there to make things seem more magical and complex than they really are.

2 Likes

Well, not that I can think of, and I’ve kind of caught the habit of using the term myself over the last few years! :wink:

On second thought, one could say that something is more performant, if it uses fewer resources, to get the same result in a similar time, I think, and “faster” would not do, in that case - so maybe more of a combination of faster and/or efficient, maybe implying better scalability.

Jargon, in other words!

I can’t think of an aspect of performance that people are referring to other than speed of execution. Allocations? Heat? DB access? Qubit flip rate? I don’t think they are being that precise. I expect it’s a plot to make old programmers look uneducated and out of touch.

1 Like

Allocations? Heat? DB access? Qubit flip rate?

I can’t imagine any non-speed parameter that wouldn’t be covered by “efficient” (the relationship between the consumed things you care about and the produced things you care about).

Haha, being one of those old programmers myself!
I think the real reason is simply the penchant in English for shortening things as much as possible,
“isn’t performant” is shorter/faster (more performant? :wink: ) than saying “doesn’t perform very well”, and
“is performant” than “performs adequately”.

(back on topic)

This probably means that OpenBLAS does not correctly identify your new processor, so it is not using the best SIMD instructions. (Prescott is the fallback architecture.)

You could build Julia to use MKL for BLAS, or you can try to force OpenBLAS to use the Haswell target. (See the README at the main Julia repo for instructions.)

2 Likes

May be Ralph’s comment is the key. Julia faster on Mac OSX than in linux? - #15 by Ralph_Smith. I should check this!

Yeah… I’ll check this.

I tried to install Julia Pro. My desktop is running elementaryOS. For those who are new to elementaryOS, it is derived from ubuntu and anything working in ubuntu should work in elementaryOS also. But I’m getting this error when trying to install:

vish@eliteone:~/Downloads$ ./JuliaPro-0.6.2.2_mkl_build-17.sh /opt/julialang/pro/
JuliaPro installation has started, please wait until all the files are extracted

OS Detected: elementary OS 0.4.1 Loki
Unsupported Linux Distribution: elementary

Thanks Ralph, This worked. After forcing to build openblas with HASWEL target, I have

vish@eliteone:/opt/julialang/src/julia$ ./julia 
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.2 (2017-12-13 18:08 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-linux-gnu

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40* (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT NO_AFFINITY HASWELL)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

julia> peakflops()
1.7415419592915222e11

julia> peakflops()
1.2685746785691425e11

julia> peakflops()
1.9630557568812375e11

:slight_smile:

But now I’m more curious. Why official julia build went to Prescott arch?

The OpenBLAS shipped with binary Julia distributions has a “dynamic” target which means that it picks an instruction set based on a runtime CPU check. Unfortunately the OpenBLAS team had not yet added checks for the newest models when the version used for Julia distribution was packaged. (I don’t know if this is corrected for your model in newer OpenBLAS.)