How to speed up array comparisons?

question

#1

I need to fix a bottleneck in my code during comparison of an array and a variable.
I’ve tested the map() function and it speed up my code only when I declare the number directly. If I store my number inside a variable I lost the performance.
Below we have a simple example in Julia 0.6.2. A is just an array and I need to compare it with a value stored in cte.

A = collect(1:1000) + rand(1000);
cte  = 560.5
@btime A .> 560.5
@btime A .> cte
@btime map( (l)-> l > 560.5,A)
@btime map( (l)-> l > cte,A)

In my computer, results were:

  • 4.679 μs (21 allocations: 5.06 KiB)
  • 3.916 μs (21 allocations: 4.95 KiB)
  • 846.938 ns (2 allocations: 1.08 KiB)
  • 16.998 μs (1003 allocations: 16.72 KiB)

We can see the third line has the better result. How can I define my variable cte so that the fourth line will had the same performance ?

Thank you


#2

You can do const cte = 560.5, or wrap the call in a function which takes cte as an argument.


#3

It works, thank you


#4

Or, if you need something super fast:

julia> function f(x,A)
       C = Array{Bool}(undef,length(A))
       @inbounds for i in 1:length(A)
         C[i] = ifelse(A[i]>x, true, false)
       end
       C
       end
f (generic function with 1 method)

julia> A = (1:1000) + rand(1000);
julia> cte = 560.5;
julia> @btime f($cte,$A);
  256.591 ns (1 allocation: 1.06 KiB)

#5

When benchmarking with BenchmarkTools, remember to always interpolate the variables!

julia> @btime map( (l)-> l > cte,A);
  25.785 μs (1003 allocations: 16.72 KiB)

julia> @btime map( (l)-> l > $cte,$A);
  737.640 ns (2 allocations: 1.09 KiB)

If you have weird benchmarking results, 98% of the time, it’s because you didn’t interpolate.


#6

If array .> x is the bottleneck in your code, it’s likely you are doing something wrong. Don’t write Matlab/Numpy-style code that performs a sequence of vector operations one by one. Write a single loop that does all your processing in one pass over the array. (And put performance-critical code like this in a function.)


#7

I have seen pretty much this exact answer to basically the same question countless times. Why the reluctance to recommending variable interpolation? To me, that seems so much more convenient than creating a function. What am I missing?


#8

Interpolation is only something that works with the BenchmarkTools macros. And writing a lot of performance critical code as one big global script is bad programming style anyway—it leads to code reuse by copy-paste and editing your code every time you need to change a parameter.


#9

Could you explain what’s variable interpolation and when does it make a difference in performance?


#10

Interpolating it adds the literal to the expression instead of using a global which breaks type-inference.


#11

It makes no difference in practice, it is a quirk of the BenchmarkTools macros

@btime only really works in the global scope. But when you measure a function call via @btime, then you will measure both dispatch and time spent in the function, because @btime has to take your variables from the global scope (where they are untyped). This is distinct from using literals for benchmarking. Generally, whenever your timings are in usecs (or even nanoseconds), I would be wary of artifacts; such functions can really only be properly benchmarked in context (inlining and whatever your CPU ends up doing).

#on 0.62
julia> using BenchmarkTools

julia> @btime +(1,2);
  2.198 ns (0 allocations: 0 bytes)

julia> x=1;y=2;
julia> @btime +(x,y);
  23.521 ns (0 allocations: 0 bytes)
julia> @btime +($x,$y)
  2.633 ns (0 allocations: 0 bytes)

and

#on 0.7
julia> using BenchmarkTools

julia> @btime +(1,2)
  0.024 ns (0 allocations: 0 bytes)

julia> x=1;y=2;
julia> @btime +(x,y)
  29.138 ns (0 allocations: 0 bytes)
julia> @btime +($x,$y)
  2.207 ns (0 allocations: 0 bytes)

#12

@foobar_lv2, I can see the difference in results, but what’s Variable Interpolation?
What’s being done exactly?


#13

If I understand correctly
when you are doing
@benchmark func(X)
if func and or X are not constants in the scope you are benchmarking… that is, if it is possible to change
the type of func or X (without getting a warning).

then what you are actually benchmarking in each iteration is:
tic
get the current values for func and X
apply func(X)
toc

Whereas when you add the $ sign in the benchmark macro, you are not using the variable you are using the value of that variable in the time of the macro invocation, which is probably what you wanted.