stakaz
June 11, 2018, 9:01am
1
Hello, I have tried this code and I am really confused why the keyword version is slo slow even when I provide a type to it.
using BenchmarkTools
f1(x) = exp(x)
f2(x::Number) = exp(x)
f3(x::Float64) = exp(x)
v1(;x=2.5) = exp(x)
v2(;x::Number=2.5) = exp(x)
v3(;x::Float64=2.5) = exp(x)
@btime f1(2.5)
@btime f2(2.5)
@btime f3(2.5)
@btime v1(x = 2.5)
@btime v2(x = 2.5)
@btime v3(x = 2.5)
This gives me the following on 0.6.2 version
54.172 ns (0 allocations: 0 bytes)
54.172 ns (0 allocations: 0 bytes)
54.172 ns (0 allocations: 0 bytes)
259.656 ns (2 allocations: 112 bytes)
430.980 ns (2 allocations: 112 bytes)
175.290 ns (1 allocation: 96 bytes)
and this are the results from 0.7.0 alpha
2.793 ns (0 allocations: 0 bytes)
2.793 ns (0 allocations: 0 bytes)
2.793 ns (0 allocations: 0 bytes)
52.416 ns (0 allocations: 0 bytes)
52.416 ns (0 allocations: 0 bytes)
52.416 ns (0 allocations: 0 bytes)
So the new named tuples performs approximately as fast as old normal keyword, however, the new non-keyword version is still much fasterβ¦
Can someone explain me this behavior. Does this mean I always have to use normal arguments instead of keyword for performance critical code?
mauro3
June 11, 2018, 10:01am
2
I think the 0.7 benchmarks for the f-functions are so fast because of the new constant propagation feature of the compiler. If you do instead:
julia> a = 2.5
2.5
julia> f1(x) = exp(x)
f1 (generic function with 1 method)
julia> @btime f1(2.5);
1.686 ns (0 allocations: 0 bytes)
julia> @btime f1($a);
11.654 ns (0 allocations: 0 bytes)
The 11.6ns is in line with what I get on 0.6 and also the same as I get for keywords on 0.7:
julia> v1(;x=2.5) = exp(x)
v1 (generic function with 1 method)
julia> @btime v1(x=$a);
11.652 ns (0 allocations: 0 bytes)
So, keywords are as fast as positional arguments in 0.7 (at least for this test). However the constant propagation which makes f1(2.5)
even faster does not seem to work for keywords.
4 Likes
stakaz
June 11, 2018, 10:11am
3
Ah, ok, that is a good explanation. Would be nice to know, why the propagation does not work on keyword arguments, though.
1 Like
Results from 0.7-alpha on Windows 10:
@btime f1(2.5)
@btime f2(2.5)
@btime f3(2.5)
@btime v1(x = 2.5)
@btime v2(x = 2.5)
@btime v3(x = 2.5)
1.282 ns (0 allocations: 0 bytes)
1.282 ns (0 allocations: 0 bytes)
1.282 ns (0 allocations: 0 bytes)
9.237 ns (0 allocations: 0 bytes)
9.237 ns (0 allocations: 0 bytes)
9.237 ns (0 allocations: 0 bytes)
It seems that even when we disable constant propagation, there is an overhead:
julia> VERSION
v"1.0.1-pre.0"
julia> x = 2.5
2.5
julia> @btime f1($x)
8.781 ns (0 allocations: 0 bytes)
12.182493960703473
julia> @btime f2($x)
8.786 ns (0 allocations: 0 bytes)
12.182493960703473
julia> @btime f3($x)
8.781 ns (0 allocations: 0 bytes)
12.182493960703473
julia> @btime v1(x = $x)
13.808 ns (0 allocations: 0 bytes)
12.182493960703473
julia> @btime v2(x = $x)
13.951 ns (0 allocations: 0 bytes)
12.182493960703473
julia> @btime v3(x = $x)
13.945 ns (0 allocations: 0 bytes)
12.182493960703473
Yeap, there is a little overhead (Julia 1.0):
julia> const b = 2.5
2.5
julia> @code_warntype v3(x=b)
Body::Float64
1 β %1 = (Base.getfield)(#temp#, :x)::Float64 ββ» getindex
β %2 = (Base.slt_int)(0, 1)::Bool βββ»β·β·β· iterate
βββ goto #3 if not %2 βββββ iterate
2 β goto #4 βββββ iterate
3 β invoke Base.getindex(()::Tuple{}, 1::Int64) βββββ
βββ $(Expr(:unreachable)) βββββ
4 β goto #5 ββββ
5 β goto #6 βββ» iterate
6 β goto #7 ββ
7 β nothing β
β %11 = invoke Main.exp(%1::Float64)::Float64 ββ» #v3#5
βββ return %11 β
julia> @code_warntype f3(b)
Body::Float64
1 1 β %1 = invoke Main.exp(_2::Float64)::Float64 β
βββ return %1 β
julia> @code_native v3(x=b)
.text
; Function #v3 {
; Location: none
; Function #v3#5; {
; Location: none
pushq %rax
vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero
movabsq $"reinterpret;", %rax
callq *%rax
;}
popq %rax
retq
nopw %cs:(%rax,%rax)
;}
julia> @code_native f3(b)
.text
; Function f3 {
; Location: REPL[5]:1
pushq %rax
movabsq $"reinterpret;", %rax
callq *%rax
popq %rax
retq
nop
;}
I am refactoring an API so I thought I would revisit this issue as keyword arguments would be very convenient.
The following compares positional and keyword arguments and NamedTuple
s.
using BenchmarkTools
pos(x) = exp(x)
kw(; x) = exp(x)
nt(y) = exp(y.x)
g_pos(x) = pos(x)
g_kw(x) = kw(x = x)
g_nt(x) = nt((x = x, ))
x = 2.5
@btime g_pos($x)
@btime g_kw($x)
@btime g_nt($x)
with the output
julia> @btime g_pos($x)
0.026 ns (0 allocations: 0 bytes)
12.182493960703473
julia> @btime g_kw($x)
8.892 ns (0 allocations: 0 bytes)
12.182493960703473
julia> @btime g_nt($x)
0.026 ns (0 allocations: 0 bytes)
12.182493960703473
julia> VERSION
v"1.2.0-DEV.17"
I am inclined to believe that something weird is going on with the benchmarking of the positional and NamedTuple
arguments, since that sub-nanosecond timing is too good to believe. Any suggestions how to do it better?
2 Likes
The following avoids the spurious benchmark and shows that all 3 versions are pretty much identical:
using BenchmarkTools
@inline op(A) = A * A
pos(A) = op(A)
kw(; A) = op(A)
nt(y) = op(y.A)
wrap_pos(A) = pos(A)
wrap_kw(A) = kw(A = A)
wrap_nt(A) = nt((A = A, ))
A = randn(5, 5)
@benchmark wrap_pos($A)
@benchmark wrap_kw($A)
@benchmark wrap_nt($A)
1 Like
I usually do something like this in the hope of more accurate benchmarks (this is for your first example, without wrapping):
julia> @btime for n=1:1000; g_pos($x); end
7.328 ΞΌs (0 allocations: 0 bytes)
julia> @btime for n=1:1000; g_kw($x); end
7.112 ΞΌs (0 allocations: 0 bytes)
julia> @btime for n=1:1000; g_nt($x); end
7.108 ΞΌs (0 allocations: 0 bytes)
2 Likes
tkf
December 15, 2018, 4:25pm
10
1 Like
I donβt know enough about LLVM for this. I imagine one can prevent the compiler from doing something insanely clever by including a more expensive inner calculation, like I did above.
But perhaps a warning could be helpful. I opened an issue
https://github.com/JuliaCI/BenchmarkTools.jl/issues/130
1 Like