stakaz
June 11, 2018, 9:01am
1
Hello, I have tried this code and I am really confused why the keyword version is slo slow even when I provide a type to it.
using BenchmarkTools
f1(x) = exp(x)
f2(x::Number) = exp(x)
f3(x::Float64) = exp(x)
v1(;x=2.5) = exp(x)
v2(;x::Number=2.5) = exp(x)
v3(;x::Float64=2.5) = exp(x)
@btime f1(2.5)
@btime f2(2.5)
@btime f3(2.5)
@btime v1(x = 2.5)
@btime v2(x = 2.5)
@btime v3(x = 2.5)
This gives me the following on 0.6.2 version
54.172 ns (0 allocations: 0 bytes)
54.172 ns (0 allocations: 0 bytes)
54.172 ns (0 allocations: 0 bytes)
259.656 ns (2 allocations: 112 bytes)
430.980 ns (2 allocations: 112 bytes)
175.290 ns (1 allocation: 96 bytes)
and this are the results from 0.7.0 alpha
2.793 ns (0 allocations: 0 bytes)
2.793 ns (0 allocations: 0 bytes)
2.793 ns (0 allocations: 0 bytes)
52.416 ns (0 allocations: 0 bytes)
52.416 ns (0 allocations: 0 bytes)
52.416 ns (0 allocations: 0 bytes)
So the new named tuples performs approximately as fast as old normal keyword, however, the new non-keyword version is still much fasterβ¦
Can someone explain me this behavior. Does this mean I always have to use normal arguments instead of keyword for performance critical code?
mauro3
June 11, 2018, 10:01am
2
I think the 0.7 benchmarks for the f-functions are so fast because of the new constant propagation feature of the compiler. If you do instead:
julia> a = 2.5
2.5
julia> f1(x) = exp(x)
f1 (generic function with 1 method)
julia> @btime f1(2.5);
1.686 ns (0 allocations: 0 bytes)
julia> @btime f1($a);
11.654 ns (0 allocations: 0 bytes)
The 11.6ns is in line with what I get on 0.6 and also the same as I get for keywords on 0.7:
julia> v1(;x=2.5) = exp(x)
v1 (generic function with 1 method)
julia> @btime v1(x=$a);
11.652 ns (0 allocations: 0 bytes)
So, keywords are as fast as positional arguments in 0.7 (at least for this test). However the constant propagation which makes f1(2.5) even faster does not seem to work for keywords.
stakaz
June 11, 2018, 10:11am
3
Ah, ok, that is a good explanation. Would be nice to know, why the propagation does not work on keyword arguments, though.
Results from 0.7-alpha on Windows 10:
@btime f1(2.5)
@btime f2(2.5)
@btime f3(2.5)
@btime v1(x = 2.5)
@btime v2(x = 2.5)
@btime v3(x = 2.5)
1.282 ns (0 allocations: 0 bytes)
1.282 ns (0 allocations: 0 bytes)
1.282 ns (0 allocations: 0 bytes)
9.237 ns (0 allocations: 0 bytes)
9.237 ns (0 allocations: 0 bytes)
9.237 ns (0 allocations: 0 bytes)
It seems that even when we disable constant propagation, there is an overhead:
julia> VERSION
v"1.0.1-pre.0"
julia> x = 2.5
2.5
julia> @btime f1($x)
8.781 ns (0 allocations: 0 bytes)
12.182493960703473
julia> @btime f2($x)
8.786 ns (0 allocations: 0 bytes)
12.182493960703473
julia> @btime f3($x)
8.781 ns (0 allocations: 0 bytes)
12.182493960703473
julia> @btime v1(x = $x)
13.808 ns (0 allocations: 0 bytes)
12.182493960703473
julia> @btime v2(x = $x)
13.951 ns (0 allocations: 0 bytes)
12.182493960703473
julia> @btime v3(x = $x)
13.945 ns (0 allocations: 0 bytes)
12.182493960703473
Yeap, there is a little overhead (Julia 1.0):
julia> const b = 2.5
2.5
julia> @code_warntype v3(x=b)
Body::Float64
1 β %1 = (Base.getfield)(#temp#, :x)::Float64 ββ» getindex
β %2 = (Base.slt_int)(0, 1)::Bool βββ»β·β·β· iterate
βββ goto #3 if not %2 βββββ iterate
2 β goto #4 βββββ iterate
3 β invoke Base.getindex(()::Tuple{}, 1::Int64) βββββ
βββ $(Expr(:unreachable)) βββββ
4 β goto #5 ββββ
5 β goto #6 βββ» iterate
6 β goto #7 ββ
7 β nothing β
β %11 = invoke Main.exp(%1::Float64)::Float64 ββ» #v3#5
βββ return %11 β
julia> @code_warntype f3(b)
Body::Float64
1 1 β %1 = invoke Main.exp(_2::Float64)::Float64 β
βββ return %1 β
julia> @code_native v3(x=b)
.text
; Function #v3 {
; Location: none
; Function #v3#5; {
; Location: none
pushq %rax
vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero
movabsq $"reinterpret;", %rax
callq *%rax
;}
popq %rax
retq
nopw %cs:(%rax,%rax)
;}
julia> @code_native f3(b)
.text
; Function f3 {
; Location: REPL[5]:1
pushq %rax
movabsq $"reinterpret;", %rax
callq *%rax
popq %rax
retq
nop
;}
I am refactoring an API so I thought I would revisit this issue as keyword arguments would be very convenient.
The following compares positional and keyword arguments and NamedTuples.
using BenchmarkTools
pos(x) = exp(x)
kw(; x) = exp(x)
nt(y) = exp(y.x)
g_pos(x) = pos(x)
g_kw(x) = kw(x = x)
g_nt(x) = nt((x = x, ))
x = 2.5
@btime g_pos($x)
@btime g_kw($x)
@btime g_nt($x)
with the output
julia> @btime g_pos($x)
0.026 ns (0 allocations: 0 bytes)
12.182493960703473
julia> @btime g_kw($x)
8.892 ns (0 allocations: 0 bytes)
12.182493960703473
julia> @btime g_nt($x)
0.026 ns (0 allocations: 0 bytes)
12.182493960703473
julia> VERSION
v"1.2.0-DEV.17"
I am inclined to believe that something weird is going on with the benchmarking of the positional and NamedTuple arguments, since that sub-nanosecond timing is too good to believe. Any suggestions how to do it better?
The following avoids the spurious benchmark and shows that all 3 versions are pretty much identical:
using BenchmarkTools
@inline op(A) = A * A
pos(A) = op(A)
kw(; A) = op(A)
nt(y) = op(y.A)
wrap_pos(A) = pos(A)
wrap_kw(A) = kw(A = A)
wrap_nt(A) = nt((A = A, ))
A = randn(5, 5)
@benchmark wrap_pos($A)
@benchmark wrap_kw($A)
@benchmark wrap_nt($A)
I usually do something like this in the hope of more accurate benchmarks (this is for your first example, without wrapping):
julia> @btime for n=1:1000; g_pos($x); end
7.328 ΞΌs (0 allocations: 0 bytes)
julia> @btime for n=1:1000; g_kw($x); end
7.112 ΞΌs (0 allocations: 0 bytes)
julia> @btime for n=1:1000; g_nt($x); end
7.108 ΞΌs (0 allocations: 0 bytes)
tkf
December 15, 2018, 4:25pm
10
I donβt know enough about LLVM for this. I imagine one can prevent the compiler from doing something insanely clever by including a more expensive inner calculation, like I did above.
But perhaps a warning could be helpful. I opened an issue
https://github.com/JuliaCI/BenchmarkTools.jl/issues/130