Why is the first expression ~2.5x slower?
julia> x = rand(10^6);
julia> y = rand(10^6);
julia> @btime sum(a * b for (a, b) in zip(x, y));
4.280 ms (5 allocations: 112 bytes)
julia> @btime sum(t[1] * t[2] for t in zip(x, y));
1.585 ms (5 allocations: 112 bytes)
1 Like
The generated code looks identical so this is kinda weird.
However, testing this on 0.7, there doesn’t seem to be a difference in timing anymore.
What happens if you evaluate them in a fresh session in the opposite order?
Order doesn’t matter on my machine. Using @time
instead of @btime
doesn’t result in this discrepancy. Interpolating the variables in the @btime
version doesn’t change things. The code_llvm
for the full call including the btime
macro seems to be a bit different between the two…
Just to be clear, how are you running this code on nightly? With BenchmarkTools 0.0.8 or latest master on Julia 92fa0f3ace (0 days old master), I get
julia> @btime sum(a * b for (a, b) in zip(x, y));
ERROR: syntax: invalid syntax (escape (call (outerref sum) (call (top Generator) (block (null) (block (= #9 (new ##9#11)) (null) (unnecessary #9))) (call (outerref zip) (outerref x) (outerref y)))))
Stacktrace:
[1] generate_benchmark_definition(::Module, ::Array{Symbol,1}, ::Array{Any,1}, ::Expr, ::Void, ::Void, ::BenchmarkTools.Parameters) at /Users/twan/code/julia/RigidBodyDynamics/v0.7/BenchmarkTools/src/execution.jl:289
julia> @btime sum(t[1] * t[2] for t in zip(x, y));
ERROR: syntax: invalid syntax (escape (call (outerref sum) (call (top Generator) (block (null) (block (= #12 (new ##12#14)) (null) (unnecessary #12))) (call (outerref zip) (outerref x) (outerref y)))))
Stacktrace:
[1] generate_benchmark_definition(::Module, ::Array{Symbol,1}, ::Array{Any,1}, ::Expr, ::Void, ::Void, ::BenchmarkTools.Parameters) at /Users/twan/code/julia/RigidBodyDynamics/v0.7/BenchmarkTools/src/execution.jl:289
(https://github.com/JuliaCI/BenchmarkTools.jl/issues/69)
I can get things to work using
b = @benchmarkable sum(a * b for (a, b) in zip(x, y))
run(b)
in which case there is no difference between the two (while there still is a difference on 0.6 when you first use @benchmarkable
and then run
).
How do you see the code in the anonymous function built by the Generator
constructor?
Good call; there is a difference there
julia> g1 = Base.Generator(a * b for (a, b) in zip(x, y));
julia> @code_llvm g1.f(first(g1.iter))
define double @"julia_#35_61397"([2 x double]* nocapture readonly dereferenceable(16)) #0 !dbg !5 {
top:
%1 = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 0
%2 = load double, double* %1, align 8
%3 = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 1
%4 = load double, double* %3, align 8
%5 = fmul double %2, %4
ret double %5
}
julia> g2 = Base.Generator(t[1] * t[2] for t in zip(x, y));
julia> @code_llvm g2.f(first(g2.iter))
define double @"julia_#37_61406"([2 x double]* nocapture readonly dereferenceable(16)) #0 !dbg !5 {
top:
%1 = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 0
%2 = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 1
%3 = load double, double* %1, align 8
%4 = load double, double* %2, align 8
%5 = fmul double %3, %4
ret double %5
}
edit: but not in the code_native
; I’m definitely not an expert on LLVM IR (does it just not matter that things are shuffled around in code_llvm
?)
You put it in a function.
Edit: Perhaps this doesn’t work and I might have just seen the setup to the call. In that case, woops.
I saw the difference with @time
as well.