I need to fix a bottleneck in my code during comparison of an array and a variable.
I’ve tested the map() function and it speed up my code only when I declare the number directly. If I store my number inside a variable I lost the performance.
Below we have a simple example in Julia 0.6.2. A is just an array and I need to compare it with a value stored in cte.
A = collect(1:1000) + rand(1000);
cte = 560.5
@btime A .> 560.5
@btime A .> cte
@btime map( (l)-> l > 560.5,A)
@btime map( (l)-> l > cte,A)
In my computer, results were:
4.679 μs (21 allocations: 5.06 KiB)
3.916 μs (21 allocations: 4.95 KiB)
846.938 ns (2 allocations: 1.08 KiB)
16.998 μs (1003 allocations: 16.72 KiB)
We can see the third line has the better result. How can I define my variable cte so that the fourth line will had the same performance ?
julia> function f(x,A)
C = Array{Bool}(undef,length(A))
@inbounds for i in 1:length(A)
C[i] = ifelse(A[i]>x, true, false)
end
C
end
f (generic function with 1 method)
julia> A = (1:1000) + rand(1000);
julia> cte = 560.5;
julia> @btime f($cte,$A);
256.591 ns (1 allocation: 1.06 KiB)
If array .> x is the bottleneck in your code, it’s likely you are doing something wrong. Don’t write Matlab/Numpy-style code that performs a sequence of vector operations one by one. Write a single loop that does all your processing in one pass over the array. (And put performance-critical code like this in a function.)
I have seen pretty much this exact answer to basically the same question countless times. Why the reluctance to recommending variable interpolation? To me, that seems so much more convenient than creating a function. What am I missing?
Interpolation is only something that works with the BenchmarkTools macros. And writing a lot of performance critical code as one big global script is bad programming style anyway—it leads to code reuse by copy-paste and editing your code every time you need to change a parameter.
It makes no difference in practice, it is a quirk of the BenchmarkTools macros
@btime only really works in the global scope. But when you measure a function call via @btime, then you will measure both dispatch and time spent in the function, because @btime has to take your variables from the global scope (where they are untyped). This is distinct from using literals for benchmarking. Generally, whenever your timings are in usecs (or even nanoseconds), I would be wary of artifacts; such functions can really only be properly benchmarked in context (inlining and whatever your CPU ends up doing).
If I understand correctly
when you are doing @benchmark func(X)
if func and or X are not constants in the scope you are benchmarking… that is, if it is possible to change
the type of func or X (without getting a warning).
then what you are actually benchmarking in each iteration is:
tic
get the current values for func and X
apply func(X)
toc
Whereas when you add the $ sign in the benchmark macro, you are not using the variable you are using the value of that variable in the time of the macro invocation, which is probably what you wanted.