Performance: tuple unpacking

Why is the first expression ~2.5x slower?

julia> x = rand(10^6);

julia> y = rand(10^6);

julia> @btime sum(a * b for (a, b) in zip(x, y));
  4.280 ms (5 allocations: 112 bytes)

julia> @btime sum(t[1] * t[2] for t in zip(x, y));
  1.585 ms (5 allocations: 112 bytes)
1 Like

The generated code looks identical so this is kinda weird.

However, testing this on 0.7, there doesn’t seem to be a difference in timing anymore.

What happens if you evaluate them in a fresh session in the opposite order?

Order doesn’t matter on my machine. Using @time instead of @btime doesn’t result in this discrepancy. Interpolating the variables in the @btime version doesn’t change things. The code_llvm for the full call including the btime macro seems to be a bit different between the two…

Just to be clear, how are you running this code on nightly? With BenchmarkTools 0.0.8 or latest master on Julia 92fa0f3ace (0 days old master), I get

julia> @btime sum(a * b for (a, b) in zip(x, y));
ERROR: syntax: invalid syntax (escape (call (outerref sum) (call (top Generator) (block (null) (block (= #9 (new ##9#11)) (null) (unnecessary #9))) (call (outerref zip) (outerref x) (outerref y)))))
Stacktrace:
 [1] generate_benchmark_definition(::Module, ::Array{Symbol,1}, ::Array{Any,1}, ::Expr, ::Void, ::Void, ::BenchmarkTools.Parameters) at /Users/twan/code/julia/RigidBodyDynamics/v0.7/BenchmarkTools/src/execution.jl:289

julia> @btime sum(t[1] * t[2] for t in zip(x, y));
ERROR: syntax: invalid syntax (escape (call (outerref sum) (call (top Generator) (block (null) (block (= #12 (new ##12#14)) (null) (unnecessary #12))) (call (outerref zip) (outerref x) (outerref y)))))
Stacktrace:
 [1] generate_benchmark_definition(::Module, ::Array{Symbol,1}, ::Array{Any,1}, ::Expr, ::Void, ::Void, ::BenchmarkTools.Parameters) at /Users/twan/code/julia/RigidBodyDynamics/v0.7/BenchmarkTools/src/execution.jl:289

(https://github.com/JuliaCI/BenchmarkTools.jl/issues/69)

I can get things to work using

b = @benchmarkable sum(a * b for (a, b) in zip(x, y))
run(b)

in which case there is no difference between the two (while there still is a difference on 0.6 when you first use @benchmarkable and then run).

How do you see the code in the anonymous function built by the Generator constructor?

Good call; there is a difference there

julia> g1 = Base.Generator(a * b for (a, b) in zip(x, y));

julia> @code_llvm g1.f(first(g1.iter))

define double @"julia_#35_61397"([2 x double]* nocapture readonly dereferenceable(16)) #0 !dbg !5 {
top:
  %1 = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 0
  %2 = load double, double* %1, align 8
  %3 = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 1
  %4 = load double, double* %3, align 8
  %5 = fmul double %2, %4
  ret double %5
}

julia> g2 = Base.Generator(t[1] * t[2] for t in zip(x, y));

julia> @code_llvm g2.f(first(g2.iter))

define double @"julia_#37_61406"([2 x double]* nocapture readonly dereferenceable(16)) #0 !dbg !5 {
top:
  %1 = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 0
  %2 = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 1
  %3 = load double, double* %1, align 8
  %4 = load double, double* %2, align 8
  %5 = fmul double %3, %4
  ret double %5
}

edit: but not in the code_native; I’m definitely not an expert on LLVM IR (does it just not matter that things are shuffled around in code_llvm?)

You put it in a function.

Edit: Perhaps this doesn’t work and I might have just seen the setup to the call. In that case, woops.

I saw the difference with @time as well.