Confused with @time result for simple addtion of arrays

Hello, I am new to julia. I just installed the latest julia 1.3.1, and then run several small timings.

First I define

a=[1,2,3]

Then timing the first time

@time a+a

it outputs
0.073568 seconds (231.55 k allocations: 11.170 MiB)
this is unreasonably slow.

run @time a+a the second time gives
0.000004 seconds (5 allocations: 272 bytes)

According to the “performance-tips” in the manual, it does mentioned that

On the first call ( @time sum_global() ) the function gets compiled. (If you’ve not yet used @time in this session, it will also compile functions needed for timing.)

But I don’t understand. Does something as simple as a+a also need to go through a JIT compilation process for the first time? If so, how could Julia be fast because 0.07s is absurd for [1,2,3]+[1,2,3], and I am not sure if the second timing result is just cached result or not.

I never encountered this kind of wierd timing in python. For example, using numpy

a=np.array([1,2,3])
%time a+a

gives

CPU times: user 29 µs, sys: 3 µs, total: 32 µs
Wall time: 38.1 µs

and purely python interpreter is even faster for this small scale problem.

a=[1,2,3]
%time [i+i for i in a]

gives

CPU times: user 8 µs, sys: 1 µs, total: 9 µs
Wall time: 12.9 µs

So how to understand my Julia timing result? How could Julia be fast?

1 Like

You shouldn’t use @time.

Instead, use BenchmarkTools.@btime from the BenchmarkTools.jl package.

3 Likes
julia> import BenchmarkTools

julia> a = [1,2,3]
3-element Array{Int64,1}:
 1
 2
 3

julia> BenchmarkTools.@btime $a + $a
  55.277 ns (1 allocation: 112 bytes)
3-element Array{Int64,1}:
 2
 4
 6

julia> BenchmarkTools.@btime b + b setup=(b = [rand(Int), rand(Int), rand(Int)])
  53.097 ns (1 allocation: 112 bytes)
3-element Array{Int64,1}:
  941756760120547774
 2851675340207919818
 5201617382927345148
1 Like

This isn’t weird as Python isn’t a compiled language. When you use @time for the first time, you are also measuring compilation time so you should disregard the first run of @time.

As @dilumaluthge points out correctly, you should use @btime from BenchMarkTools.jl The $ sign is the correct way to interpolate values in benchmark expressions. See below

https://github.com/JuliaCI/BenchmarkTools.jl/blob/master/doc/manual.md#interpolating-values-into-benchmark-expressions

2 Likes

of course, + is just another function.

5 Likes

@dilumaluthge @jling bashonubuntu Thank you so much for reply.

I have never seen auto compilation of plus operation before.I am familiar with python numpy and Mathematica. In Mathematica, there is also auto compilation for compound functions that act on packed array. But Mathematica doesn’t compile simple arithmetic function like plus in realtime. Instead numpy and mathematica directly use math library to do vectorized plus, trignometric operation on arrays.
.
But as jling says, julia needs to compile simple + operation for the first time. When Julia meets a+a the first time, it compiles. Then when it meets a+a+a, it needs another compilation, and it goes on. The only benefit I can think of for this kind of “diligent” compilation is maybe to eliminating temporary arrays during chain arithmetic, as we know numpy will generate temporary array for a+a+a. Am I right?

But the question is that is Julia really performing better than using math library directly like numpy?

I did a test on simple 3*a+sin(a)+sqrt(a)
for Julia

a=rand(10000000)
@btime @. 3*a+sin(a)+sqrt(a);

takes 101.989 ms (10 allocations: 76.29 MiB)

for Numpy

import numpy as np
import mkl
mkl.set_num_threads(1)   #constrain 1 thread for comparing to Julia
%timeit 3*a+np.sin(a)+np.sqrt(a)

takes 90.7 ms ± 322 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Julia is actually slow than numpy in this short arithmetics, not to mention Julia has to compie for the first time which wastes more time.

But the slowness maybe due to my Julia is not compiled with MKL? My Numpy is shipped with Anaconda thus linked to MKL, and in the above case, it uses intel VML lib.

What is more, python has numba and numexpr providing more accerlation(and of course needs compilation like Julia), let us see

for numba, it ues intel svml lib

import numba
@numba.njit
def test(a):
    return 3*a+np.sin(a) + np.sqrt(a)
%timeit numba_test(a)

takes
53.9 ms ± 1.11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

and for numexpr, it uses intel vml

import numexpr as ne
ne.set_num_threads(1)
%timeit ne.evaluate('3*a+sin(a)+sqrt(a)')

takes 60.7 ms ± 236 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Since I failed to install Julia with MKL(due to unstable network), I do not know if Julia with MKL is doing good compare to numba and numexpr.

You shouldn’t expect any speedup from Julia compared to calling numpy / MKL / VML for workloads that these libraries are designed and optimized for. That would either require magic or that the people behind the libraries are incompetent which they obviously aren’t.

One major selling point of Julia is the case where your problem isn’t easily formulated in a manner where you can just call out to existing vectorized implementation. You can then code it yourself (using loops etc) and get great performance.

4 Likes

@kristoffer.carlsson Thank you so much for reply. That clears some of my confusion.

Whether it is “simple” or not is not particularly relevant: Julia does come with some of its operations pre-compiled, but in general whatever you do will likely involve the compiler in one way or another. Note that you’re not just compiling the addition of two vectors: you’re compiling the @time macro itself.

The compilation is cached. The actual math is not. Adding [1, 2, 3] to itself is actually so fast that @time does not have the precision to give you a useful result. That, among other reasons, is why you want to use @btime from BenchmarkTools.jl, which will run the function several times to compute a meaningful time.

Right–python (without something like PyPy) doesn’t compile your code to native code. That’s why python ends up being ~250 times slower than Julia every time you run the operation, with the exception of the first call in Julia.

The important observation is that actual code typically involves doing the same kinds of operation over and over, so the cost of compiling the code is usually worthwhile.

1 Like

So how to understand my Julia timing result? How could Julia be fast?

Julia is not (yet ?) fast for simple scripts like this, it is not a scripting language. C++ or Fortran are not fast either if you include compilation times.
Julia is very fast for heavy computations.