Performance: tuple unpacking

Marek_Kukan · October 4, 2017, 3:53pm

Why is the first expression ~2.5x slower?

julia> x = rand(10^6);

julia> y = rand(10^6);

julia> @btime sum(a * b for (a, b) in zip(x, y));
  4.280 ms (5 allocations: 112 bytes)

julia> @btime sum(t[1] * t[2] for t in zip(x, y));
  1.585 ms (5 allocations: 112 bytes)

kristoffer.carlsson · October 4, 2017, 5:15pm

The generated code looks identical so this is kinda weird.

However, testing this on 0.7, there doesn’t seem to be a difference in timing anymore.

StefanKarpinski · October 4, 2017, 5:54pm

What happens if you evaluate them in a fresh session in the opposite order?

tkoolen · October 5, 2017, 4:01am

Order doesn’t matter on my machine. Using @time instead of @btime doesn’t result in this discrepancy. Interpolating the variables in the @btime version doesn’t change things. The code_llvm for the full call including the btime macro seems to be a bit different between the two…

tkoolen · October 5, 2017, 4:27am

Just to be clear, how are you running this code on nightly? With BenchmarkTools 0.0.8 or latest master on Julia 92fa0f3ace (0 days old master), I get

julia> @btime sum(a * b for (a, b) in zip(x, y));
ERROR: syntax: invalid syntax (escape (call (outerref sum) (call (top Generator) (block (null) (block (= #9 (new ##9#11)) (null) (unnecessary #9))) (call (outerref zip) (outerref x) (outerref y)))))
Stacktrace:
 [1] generate_benchmark_definition(::Module, ::Array{Symbol,1}, ::Array{Any,1}, ::Expr, ::Void, ::Void, ::BenchmarkTools.Parameters) at /Users/twan/code/julia/RigidBodyDynamics/v0.7/BenchmarkTools/src/execution.jl:289

julia> @btime sum(t[1] * t[2] for t in zip(x, y));
ERROR: syntax: invalid syntax (escape (call (outerref sum) (call (top Generator) (block (null) (block (= #12 (new ##12#14)) (null) (unnecessary #12))) (call (outerref zip) (outerref x) (outerref y)))))
Stacktrace:
 [1] generate_benchmark_definition(::Module, ::Array{Symbol,1}, ::Array{Any,1}, ::Expr, ::Void, ::Void, ::BenchmarkTools.Parameters) at /Users/twan/code/julia/RigidBodyDynamics/v0.7/BenchmarkTools/src/execution.jl:289

(https://github.com/JuliaCI/BenchmarkTools.jl/issues/69)

I can get things to work using

b = @benchmarkable sum(a * b for (a, b) in zip(x, y))
run(b)

in which case there is no difference between the two (while there still is a difference on 0.6 when you first use @benchmarkable and then run).

Ralph_Smith · October 5, 2017, 4:41am

How do you see the code in the anonymous function built by the Generator constructor?

tkoolen · October 5, 2017, 4:52am

Good call; there is a difference there

julia> g1 = Base.Generator(a * b for (a, b) in zip(x, y));

julia> @code_llvm g1.f(first(g1.iter))

define double @"julia_#35_61397"([2 x double]* nocapture readonly dereferenceable(16)) #0 !dbg !5 {
top:
  %1 = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 0
  %2 = load double, double* %1, align 8
  %3 = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 1
  %4 = load double, double* %3, align 8
  %5 = fmul double %2, %4
  ret double %5
}

julia> g2 = Base.Generator(t[1] * t[2] for t in zip(x, y));

julia> @code_llvm g2.f(first(g2.iter))

define double @"julia_#37_61406"([2 x double]* nocapture readonly dereferenceable(16)) #0 !dbg !5 {
top:
  %1 = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 0
  %2 = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 1
  %3 = load double, double* %1, align 8
  %4 = load double, double* %2, align 8
  %5 = fmul double %3, %4
  ret double %5
}

edit: but not in the code_native; I’m definitely not an expert on LLVM IR (does it just not matter that things are shuffled around in code_llvm?)

kristoffer.carlsson · October 5, 2017, 6:03am

You put it in a function.

Edit: Perhaps this doesn’t work and I might have just seen the setup to the call. In that case, woops.

kristoffer.carlsson · October 5, 2017, 6:11am

I saw the difference with @time as well.

Topic		Replies	Views
Generators speed New to Julia	5	1316	May 15, 2017
Confusing benchmark time results and memory allocation depending on number of calls for function with zip Performance question	12	1127	March 12, 2019
Benchmark is moving target? Performance	2	186	December 3, 2024
Performance of creating a tuple with a for loop Performance	26	2633	September 15, 2020
@time and @btime report different values? General Usage	17	690	May 1, 2022

Performance: tuple unpacking

Related topics