Performance of typed keyword arguments

performance

#1

Hello, I have tried this code and I am really confused why the keyword version is slo slow even when I provide a type to it.

using BenchmarkTools

f1(x) = exp(x)
f2(x::Number) = exp(x)
f3(x::Float64) = exp(x)

v1(;x=2.5) = exp(x)
v2(;x::Number=2.5) = exp(x)
v3(;x::Float64=2.5) = exp(x)

@btime f1(2.5)
@btime f2(2.5)
@btime f3(2.5)

@btime v1(x = 2.5)
@btime v2(x = 2.5)
@btime v3(x = 2.5)

This gives me the following on 0.6.2 version

  54.172 ns (0 allocations: 0 bytes)
  54.172 ns (0 allocations: 0 bytes)
  54.172 ns (0 allocations: 0 bytes)

  259.656 ns (2 allocations: 112 bytes)
  430.980 ns (2 allocations: 112 bytes)
  175.290 ns (1 allocation: 96 bytes)

and this are the results from 0.7.0 alpha

  2.793 ns (0 allocations: 0 bytes)
  2.793 ns (0 allocations: 0 bytes)
  2.793 ns (0 allocations: 0 bytes)

  52.416 ns (0 allocations: 0 bytes)
  52.416 ns (0 allocations: 0 bytes)
  52.416 ns (0 allocations: 0 bytes)

So the new named tuples performs approximately as fast as old normal keyword, however, the new non-keyword version is still much faster…

Can someone explain me this behavior. Does this mean I always have to use normal arguments instead of keyword for performance critical code?


#2

I think the 0.7 benchmarks for the f-functions are so fast because of the new constant propagation feature of the compiler. If you do instead:

julia> a = 2.5                                                                                                                                                          
2.5                                                                                                                                                                     

julia> f1(x) = exp(x)                                                                                                                                                   
f1 (generic function with 1 method)                                                                                                                                     
                                                                                                                                                                        
julia> @btime f1(2.5);                                                                                                                                                  
  1.686 ns (0 allocations: 0 bytes)                                                                                                                                     
                                                                                                                                                                        
julia> @btime f1($a);                                                                                                                                                   
  11.654 ns (0 allocations: 0 bytes)                                                                                                                                    

The 11.6ns is in line with what I get on 0.6 and also the same as I get for keywords on 0.7:

julia> v1(;x=2.5) = exp(x)                                                                                                                                              
v1 (generic function with 1 method)                                                                                                                                     
                                                                                                                                                                        
julia> @btime v1(x=$a);                                                                                                                                                 
  11.652 ns (0 allocations: 0 bytes)                                                                                                                                    

So, keywords are as fast as positional arguments in 0.7 (at least for this test). However the constant propagation which makes f1(2.5) even faster does not seem to work for keywords.


#3

Ah, ok, that is a good explanation. Would be nice to know, why the propagation does not work on keyword arguments, though.


#4

Results from 0.7-alpha on Windows 10:

@btime f1(2.5)
@btime f2(2.5)
@btime f3(2.5)

@btime v1(x = 2.5)
@btime v2(x = 2.5)
@btime v3(x = 2.5)

  1.282 ns (0 allocations: 0 bytes)
  1.282 ns (0 allocations: 0 bytes)
  1.282 ns (0 allocations: 0 bytes)
  9.237 ns (0 allocations: 0 bytes)
  9.237 ns (0 allocations: 0 bytes)
  9.237 ns (0 allocations: 0 bytes)

#5

It seems that even when we disable constant propagation, there is an overhead:

julia> VERSION
v"1.0.1-pre.0"

julia> x = 2.5
2.5

julia> @btime f1($x)
  8.781 ns (0 allocations: 0 bytes)
12.182493960703473

julia> @btime f2($x)
  8.786 ns (0 allocations: 0 bytes)
12.182493960703473

julia> @btime f3($x)
  8.781 ns (0 allocations: 0 bytes)
12.182493960703473

julia> @btime v1(x = $x)
  13.808 ns (0 allocations: 0 bytes)
12.182493960703473

julia> @btime v2(x = $x)
  13.951 ns (0 allocations: 0 bytes)
12.182493960703473

julia> @btime v3(x = $x)
  13.945 ns (0 allocations: 0 bytes)
12.182493960703473

#6

Yeap, there is a little overhead (Julia 1.0):

julia> const b = 2.5
2.5

julia> @code_warntype v3(x=b)
Body::Float64
 1 ─ %1  = (Base.getfield)(#temp#, :x)::Float64                                                                                                                                                                           β”‚β•»     getindex
 β”‚   %2  = (Base.slt_int)(0, 1)::Bool                                                                                                                                                                                     β”‚β”‚β•»β•·β•·β•·  iterate
 └──       goto #3 if not %2                                                                                                                                                                                              │││┃│    iterate
 2 ─       goto #4                                                                                                                                                                                                        ││││┃     iterate
 3 ─       invoke Base.getindex(()::Tuple{}, 1::Int64)                                                                                                                                                                    β”‚β”‚β”‚β”‚β”‚ 
 └──       $(Expr(:unreachable))                                                                                                                                                                                          β”‚β”‚β”‚β”‚β”‚ 
 4 β”„       goto #5                                                                                                                                                                                                        β”‚β”‚β”‚β”‚  
 5 ─       goto #6                                                                                                                                                                                                        β”‚β”‚β•»     iterate
 6 ─       goto #7                                                                                                                                                                                                        β”‚β”‚    
 7 ─       nothing                                                                                                                                                                                                        β”‚     
 β”‚   %11 = invoke Main.exp(%1::Float64)::Float64                                                                                                                                                                          β”‚β•»     #v3#5
 └──       return %11                                                                                                                                                                                                     β”‚     

julia> @code_warntype f3(b)
Body::Float64
1 1 ─ %1 = invoke Main.exp(_2::Float64)::Float64                                                                                                                                                                                          β”‚
  └──      return %1                                                                                                                                                                                                                      β”‚

julia> @code_native v3(x=b)
	.text
; Function #v3 {
; Location: none
; Function #v3#5; {
; Location: none
	pushq	%rax
	vmovsd	(%rdi), %xmm0           # xmm0 = mem[0],zero
	movabsq	$"reinterpret;", %rax
	callq	*%rax
;}
	popq	%rax
	retq
	nopw	%cs:(%rax,%rax)
;}

julia> @code_native f3(b)
	.text
; Function f3 {
; Location: REPL[5]:1
	pushq	%rax
	movabsq	$"reinterpret;", %rax
	callq	*%rax
	popq	%rax
	retq
	nop
;}