Simple For Loops and Speed

question
performance

#1

version 0.7.0-dev. not sure where this question belongs. it is both a first-step and performance question.

julia> rng= MersenneTwister(1234)
MersenneTwister(...

julia> MC= 10000000
10000000

julia> global x  ## faster??
 
julia> @time x=randexp(rng, Float32, MC);
 0.188982 seconds (38.29 k allocations: 40.368 MiB, 5.30% gc time)

julia> @time for i in 1:MC ; x=randexp(rng, Float32); end;
  1.550223 seconds (40.00 M allocations: 762.924 MiB, 3.21% gc time)

julia> @time x=randexp(rng, Float32, MC);
 0.106505 seconds (6 allocations: 38.147 MiB, 9.52% gc time)

julia> @time for i in 1:MC ; x=randexp(rng, Float32); end;
  1.551266 seconds (40.00 M allocations: 762.924 MiB, 3.40% gc time)

now, I am guessing that randexp with the third argument is doing a loop in C. I am surprised that this is 10 times faster. I also could deduce that this is not only overhead from randexp:

julia> @time for i in 1:MC ; end;
0.917997 seconds (30.00 M allocations: 610.336 MiB, 2.20% gc time)

so more than half is just loop overhead and has nothing to do with crossing the julia/C layer into randexp.

am I making an elementary mistake?


#2

Before doing any benchmarking, please read the Performance Tips in the Julia manual. In particular, global constants should be written e.g.

const MC=10^7

Wrap everything in a function, and then use BenchmarkTools.jl for benchmarking.

randexp is written in pure Julia:

@edit randexp()

#3

I would do something like:


julia> using BenchmarkTools

julia> const RNG = MersenneTwister(1234);

julia> const N = 10^6
1000000

julia> @btime randexp($RNG, Float32, $N);
  7.229 ms (2 allocations: 3.81 MiB)

julia> function bench(N)
           [randexp(RNG, Float32) for i in 1:N]
       end
bench (generic function with 1 method)

julia> @benchmark bench($N)
BenchmarkTools.Trial:
  memory estimate:  3.81 MiB
  allocs estimate:  2
  --------------
  minimum time:     7.186 ms (0.00% GC)
  median time:      8.426 ms (0.00% GC)
  mean time:        9.196 ms (6.59% GC)
  maximum time:     49.813 ms (83.62% GC)
  --------------
  samples:          543
  evals/sample:     1

#4

Putting things in a function and timing with just @time:

function trandexp(MC)
  rng = MersenneTwister(1234)
  @time x = randexp(rng, Float32, MC);
  @time begin 
    y = Array{Float32}(MC)
    for i in 1:MC 
      @inbounds y[i] = randexp(rng, Float32)
    end
  end
end
trandexp(10^7) 

You get similar results:

0.082385 seconds (2 allocations: 38.147 MiB, 7.19% gc time)
0.080481 seconds (2 allocations: 38.147 MiB, 0.21% gc time)

#5

thank you. I did read the guide. now, some of the rules in the guide make obvious sense to me (don’t use abstract types or type-changers), others are too far off for me to encounter.

the beginning of a new language is always overwhelming, and I am not clear what sins I have committed that have slowed down the code. right now, I need to understand simple rules. why was my seemingly simple and straightforward loop so terribly slow? I need to understand both basic reasons (to generalize) and sensible generalizations that transcend my example and that I can keep in mind for the future. the following are still slow:

 @time begin; RNG= MersenneTwister(1234); function bench1(N); [randexp(RNG, Float32) for i in 1:N]; end; bench1(MC); end;
   0.603805 seconds (10.01 M allocations: 191.413 MiB, 0.84% gc time)

@time begin; RNG= MersenneTwister(1234); [randexp(RNG, Float32) for i in 1:MC]; end;
  0.648537 seconds (10.01 M allocations: 191.369 MiB, 6.71% gc time)

above are slow. below are >5 times faster:

@time begin; function bench1(N);  RNG=MersenneTwister(1234);  [randexp(RNG, Float32) for i in 1:N]; end; bench1(MC); end;
   0.151507 seconds (12.70 k allocations: 38.847 MiB, 27.09% gc time)

@time begin; function bench1(N);  RNG=MersenneTwister(1234); for i in 1:N begin randexp(RNG, Float32) end; end; end; bench1(MC); end;
  0.083991 seconds (1.19 k allocations: 76.115 KiB)

@time begin; function bench1(N);  RNG=MersenneTwister(1234); for i in 1:N; begin randexp(RNG, Float32) end; end; end; bench1(MC); end;
  0.086294 seconds (1.19 k allocations: 76.115 KiB)

if I understand this correctly, I infer the following rule: ** julia’s outermost global scope is slow; julia is fast only inside functions **. the reason for this seems unintuitive to me. or am I wrong here?

now, the following function is also very slow,

 @time begin; RNG=MersenneTwister(1234); function bench1(N); for i in 1:N; begin randexp(RNG, Float32) end; end; end; bench1(MC); end;
  0.509664 seconds (10.00 M allocations: 152.652 MiB, 0.55% gc time)

is the rule then ** inside a function, any reference to an above-scope variable made inside the function slows it down terribly ** ? and ** this should be avoided by …???.. *** . one placeholder could be “passing all above-scope variables instead as arguments into the function.”


#6

You don’t want to be running everything in one line in all your examples. You are redefining the function every time, so @time is measuring the compile time + execution time. You only care about execution time. For example:

julia> @time begin; f(x) = 5sin(x); f(3.);end
  0.003531 seconds (438 allocations: 23.647 KiB)
0.7056000402993361

julia> @time begin; f(x) = 5sin(x); f(3.);end
  0.005273 seconds (436 allocations: 23.601 KiB)
0.7056000402993361

julia> @time f(3.)
  0.000005 seconds (5 allocations: 176 bytes)
0.7056000402993361

I think you’re running into two separate issues here:

  1. Julia is fast because it compiles your code before executing it. When you time code in the outer scope like you do above, you are mostly measuring compile time. This is a one-time cost. If you look at the examples above, you seen that @time reports much lower times when you run the function twice. This is because the first run measures compile time + execution time, but the second run is just execution time.
  2. Don’t use global (outermost) variables inside your functions, unless you annotate them with const. Without const, Julia cannot efficiently compile your function. This is explained in depth in the manual.

#7

If you’d like some reading, 7 Julia Gotchas and How to Handle Them explains the problems with global scope really well with a few simple examples.


#8

It’s not that your code is slow, it’s that your benchmarking is slow. When you do the timing in global scope with non-const variables, Julia cannot be certain of the types, and cannot compile efficient code.

This is why using BenchmarkTools is a good idea. I never do any benchmarking with @time, only with @btime from BenchmarkTools. It takes care of everything for you.


#9

thank you everybody. this was very helpful. everything…from globals to benchmarking to the 7 gotchas. regards, /iaw