Performance of typed keyword arguments

stakaz · June 11, 2018, 9:01am

Hello, I have tried this code and I am really confused why the keyword version is slo slow even when I provide a type to it.

using BenchmarkTools

f1(x) = exp(x)
f2(x::Number) = exp(x)
f3(x::Float64) = exp(x)

v1(;x=2.5) = exp(x)
v2(;x::Number=2.5) = exp(x)
v3(;x::Float64=2.5) = exp(x)

@btime f1(2.5)
@btime f2(2.5)
@btime f3(2.5)

@btime v1(x = 2.5)
@btime v2(x = 2.5)
@btime v3(x = 2.5)

This gives me the following on 0.6.2 version

  54.172 ns (0 allocations: 0 bytes)
  54.172 ns (0 allocations: 0 bytes)
  54.172 ns (0 allocations: 0 bytes)

  259.656 ns (2 allocations: 112 bytes)
  430.980 ns (2 allocations: 112 bytes)
  175.290 ns (1 allocation: 96 bytes)

and this are the results from 0.7.0 alpha

  2.793 ns (0 allocations: 0 bytes)
  2.793 ns (0 allocations: 0 bytes)
  2.793 ns (0 allocations: 0 bytes)

  52.416 ns (0 allocations: 0 bytes)
  52.416 ns (0 allocations: 0 bytes)
  52.416 ns (0 allocations: 0 bytes)

So the new named tuples performs approximately as fast as old normal keyword, however, the new non-keyword version is still much faster…

Can someone explain me this behavior. Does this mean I always have to use normal arguments instead of keyword for performance critical code?

mauro3 · June 11, 2018, 10:01am

I think the 0.7 benchmarks for the f-functions are so fast because of the new constant propagation feature of the compiler. If you do instead:

julia> a = 2.5                                                                                                                                                          
2.5                                                                                                                                                                     

julia> f1(x) = exp(x)                                                                                                                                                   
f1 (generic function with 1 method)                                                                                                                                     
                                                                                                                                                                        
julia> @btime f1(2.5);                                                                                                                                                  
  1.686 ns (0 allocations: 0 bytes)                                                                                                                                     
                                                                                                                                                                        
julia> @btime f1($a);                                                                                                                                                   
  11.654 ns (0 allocations: 0 bytes)

The 11.6ns is in line with what I get on 0.6 and also the same as I get for keywords on 0.7:

julia> v1(;x=2.5) = exp(x)                                                                                                                                              
v1 (generic function with 1 method)                                                                                                                                     
                                                                                                                                                                        
julia> @btime v1(x=$a);                                                                                                                                                 
  11.652 ns (0 allocations: 0 bytes)

So, keywords are as fast as positional arguments in 0.7 (at least for this test). However the constant propagation which makes f1(2.5) even faster does not seem to work for keywords.

stakaz · June 11, 2018, 10:11am

Ah, ok, that is a good explanation. Would be nice to know, why the propagation does not work on keyword arguments, though.

Seif_Shebl · June 11, 2018, 9:47pm

Results from 0.7-alpha on Windows 10:

@btime f1(2.5)
@btime f2(2.5)
@btime f3(2.5)

@btime v1(x = 2.5)
@btime v2(x = 2.5)
@btime v3(x = 2.5)

  1.282 ns (0 allocations: 0 bytes)
  1.282 ns (0 allocations: 0 bytes)
  1.282 ns (0 allocations: 0 bytes)
  9.237 ns (0 allocations: 0 bytes)
  9.237 ns (0 allocations: 0 bytes)
  9.237 ns (0 allocations: 0 bytes)

Tamas_Papp · September 3, 2018, 9:01am

It seems that even when we disable constant propagation, there is an overhead:

julia> VERSION
v"1.0.1-pre.0"

julia> x = 2.5
2.5

julia> @btime f1($x)
  8.781 ns (0 allocations: 0 bytes)
12.182493960703473

julia> @btime f2($x)
  8.786 ns (0 allocations: 0 bytes)
12.182493960703473

julia> @btime f3($x)
  8.781 ns (0 allocations: 0 bytes)
12.182493960703473

julia> @btime v1(x = $x)
  13.808 ns (0 allocations: 0 bytes)
12.182493960703473

julia> @btime v2(x = $x)
  13.951 ns (0 allocations: 0 bytes)
12.182493960703473

julia> @btime v3(x = $x)
  13.945 ns (0 allocations: 0 bytes)
12.182493960703473

Diego_Javier_Zea · September 3, 2018, 11:25am

Yeap, there is a little overhead (Julia 1.0):

julia> const b = 2.5
2.5

julia> @code_warntype v3(x=b)
Body::Float64
 1 ─ %1  = (Base.getfield)(#temp#, :x)::Float64                                                                                                                                                                           │╻     getindex
 │   %2  = (Base.slt_int)(0, 1)::Bool                                                                                                                                                                                     ││╻╷╷╷  iterate
 └──       goto #3 if not %2                                                                                                                                                                                              │││┃│    iterate
 2 ─       goto #4                                                                                                                                                                                                        ││││┃     iterate
 3 ─       invoke Base.getindex(()::Tuple{}, 1::Int64)                                                                                                                                                                    │││││ 
 └──       $(Expr(:unreachable))                                                                                                                                                                                          │││││ 
 4 ┄       goto #5                                                                                                                                                                                                        ││││  
 5 ─       goto #6                                                                                                                                                                                                        ││╻     iterate
 6 ─       goto #7                                                                                                                                                                                                        ││    
 7 ─       nothing                                                                                                                                                                                                        │     
 │   %11 = invoke Main.exp(%1::Float64)::Float64                                                                                                                                                                          │╻     #v3#5
 └──       return %11                                                                                                                                                                                                     │     

julia> @code_warntype f3(b)
Body::Float64
1 1 ─ %1 = invoke Main.exp(_2::Float64)::Float64                                                                                                                                                                                          │
  └──      return %1                                                                                                                                                                                                                      │

julia> @code_native v3(x=b)
	.text
; Function #v3 {
; Location: none
; Function #v3#5; {
; Location: none
	pushq	%rax
	vmovsd	(%rdi), %xmm0           # xmm0 = mem[0],zero
	movabsq	$"reinterpret;", %rax
	callq	*%rax
;}
	popq	%rax
	retq
	nopw	%cs:(%rax,%rax)
;}

julia> @code_native f3(b)
	.text
; Function f3 {
; Location: REPL[5]:1
	pushq	%rax
	movabsq	$"reinterpret;", %rax
	callq	*%rax
	popq	%rax
	retq
	nop
;}

Tamas_Papp · December 15, 2018, 10:11am

I am refactoring an API so I thought I would revisit this issue as keyword arguments would be very convenient.

The following compares positional and keyword arguments and NamedTuples.

using BenchmarkTools
pos(x) = exp(x)
kw(; x) = exp(x)
nt(y) = exp(y.x)
g_pos(x) = pos(x)
g_kw(x) = kw(x = x)
g_nt(x) = nt((x = x, ))
x = 2.5
@btime g_pos($x)
@btime g_kw($x)
@btime g_nt($x)

with the output

julia> @btime g_pos($x)
  0.026 ns (0 allocations: 0 bytes)
12.182493960703473

julia> @btime g_kw($x)
  8.892 ns (0 allocations: 0 bytes)
12.182493960703473

julia> @btime g_nt($x)
  0.026 ns (0 allocations: 0 bytes)
12.182493960703473

julia> VERSION
v"1.2.0-DEV.17"

I am inclined to believe that something weird is going on with the benchmarking of the positional and NamedTuple arguments, since that sub-nanosecond timing is too good to believe. Any suggestions how to do it better?

Tamas_Papp · December 15, 2018, 10:22am

The following avoids the spurious benchmark and shows that all 3 versions are pretty much identical:

using BenchmarkTools

@inline op(A) = A * A

pos(A) = op(A)
kw(; A) = op(A)
nt(y) = op(y.A)
wrap_pos(A) = pos(A)
wrap_kw(A) = kw(A = A)
wrap_nt(A) = nt((A = A, ))

A = randn(5, 5)
@benchmark wrap_pos($A)
@benchmark wrap_kw($A)
@benchmark wrap_nt($A)

bennedich · December 15, 2018, 3:19pm

I usually do something like this in the hope of more accurate benchmarks (this is for your first example, without wrapping):

julia> @btime for n=1:1000; g_pos($x); end
  7.328 μs (0 allocations: 0 bytes)

julia> @btime for n=1:1000; g_kw($x); end
  7.112 μs (0 allocations: 0 bytes)

julia> @btime for n=1:1000; g_nt($x); end
  7.108 μs (0 allocations: 0 bytes)

tkf · December 15, 2018, 4:25pm

Maybe clobber/escape suggested here [RFC/WIP] Tools for measuring cycles and cpu_times and tricking out LLVM by vchuravy · Pull Request #92 · JuliaCI/BenchmarkTools.jl · GitHub would help for this kind of benchmark?

Tamas_Papp · December 15, 2018, 4:36pm

I don’t know enough about LLVM for this. I imagine one can prevent the compiler from doing something insanely clever by including a more expensive inner calculation, like I did above.

But perhaps a warning could be helpful. I opened an issue

https://github.com/JuliaCI/BenchmarkTools.jl/issues/130

Topic		Replies	Views
Performance of functions with keyword arguments General Usage	6	3988	September 16, 2017
Call overhead of keyword args General Usage	2	619	November 10, 2018
Performance difference between optional args and keyword args Performance question	8	1348	April 15, 2021
Should I add type annotation to keyword arguments? Performance keyword-arguments	10	799	May 30, 2023
Specifying keyword argument results in much slower code Performance question	2	1760	January 29, 2019

Performance of typed keyword arguments

Related topics