Vararg vs Tuple{Vararg} performance

SnoopyDoowop · March 6, 2020, 11:13pm

Hello!

While trying out some things in the julia REPL to see how a[i…] = x is transformed into setindex!(a, x, i…). I came across multiple questions :

I am wondering why the prototype of setindex! is setindex!(A::AbstractArray,X::Any,inds::Vargarg{Any})
and not setindex!(A::AbstractArray,X::Any,inds:Tuple{Vararg{Any})

I thought it was for performance reasons but when i tried to replicate the behavior and benchmark it with BenchmarkTools.jl, the results were not exactly what i expected.

If I declare those two lambda functions:

f = (a, b::Vararg) -> b

g = (a, b::Tuple{Vararg}) -> b

And benchmark (with 100 000 samples) f(1, 2) and g(1, (2,)) then effectively f is a little faster than gby approx 0.5 nano-seconds in median time.

BUT if i re-run this benchmark with f(1, 2, 3) and g(1, (2, 3)) then f is terribly slower that g. Where g takes 15 ns in median time, f takes 171 ns ??

1: First question: is this performance gap normal ? Is the first proto for setindex! preferred over the second because setindex! is practically never called with multiple indices?

Moreover, It seems like there isn’t a way to define easily a Tuple{Vararg} and extract its content via the argument destructuring syntax.

I can do

f = (a, b) -> b

and

f = (a, b...) -> b

and

g = (a, (b,)) -> b

but not

g = (a, (b...,)) -> b

2: Am I missing something syntactically speaking ?

Anyway, sorry for the long post… here is my versioninfo() in case. Thanks!

julia> versioninfo()
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core™ i7-8750H CPU @ 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

simeonschaub · March 6, 2020, 11:37pm

There shouldn’t be a performance difference here at all, splatting reasonably sized Tuples should not add any allocations or runtime operations. The difference to passing Tuples is just that using Vararg (or x..., equivalently) can be a bit less verbose to write and make for a more consistent API. The difference you are observing is probably due to the way you are benchmarking this. Are you using @btime from BenchmarkTools.jl and if yes, how are you interpolating the function arguments? 0.5 ns should be about 1 clock cycle, which is probably just due to random noise.
x... should be equivalent to x::Vararg. Allowing to destructurjng Tuples in function arguments as f(x, (a, b...)) has been proposed before, which would make your fourth example work as well, but for now, this is invalid syntax. (a, (b...,)) -> b would be equivalent to (a, b::Tuple) -> b anyways. Other than that, I don’t really see anything you are missing.

SnoopyDoowop · March 7, 2020, 12:00am

Here is exactly what I’m doing, no more, no less

julia> f = (a, b::Vararg) -> b
#3 (generic function with 1 method)

julia> g = (a, b::Tuple{Vararg}) -> b
#5 (generic function with 1 method)

julia> import BenchmarkTools

julia> BenchmarkTools.@benchmark f(1, 2, 3)
BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  1
  --------------
  minimum time:     163.693 ns (0.00% GC)
  median time:      188.505 ns (0.00% GC)
  mean time:        192.193 ns (0.28% GC)
  maximum time:     1.375 μs (83.39% GC)
  --------------
  samples:          10000
  evals/sample:     772

julia> BenchmarkTools.@benchmark g(1, (2, 3))
BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  1
  --------------
  minimum time:     14.516 ns (0.00% GC)
  median time:      17.107 ns (0.00% GC)
  mean time:        18.977 ns (3.08% GC)
  maximum time:     947.285 ns (97.69% GC)
  --------------
  samples:          10000
  evals/sample:     998

Re-runing the two benchmarks multiple times gave the same results.

Ok, Thanks! I’m looking forward to this feature being implemented

waralex · March 7, 2020, 12:23am

The problem is that global variables are not typed, and Julia checks the types of f and g (and, I suppose, looks for a suitable method) with every call. Compare with it:

ulia> const f = (a, b::Vararg) -> b
#7 (generic function with 1 method)

julia> const g = (a, b::Tuple{Vararg}) -> b
#9 (generic function with 1 method)
julia> @benchmark f(1,2,3)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     0.036 ns (0.00% GC)
  median time:      0.039 ns (0.00% GC)
  mean time:        0.039 ns (0.00% GC)
  maximum time:     0.075 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark g(1,(2,3))
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     0.037 ns (0.00% GC)
  median time:      0.039 ns (0.00% GC)
  mean time:        0.039 ns (0.00% GC)
  maximum time:     0.090 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

SnoopyDoowop · March 7, 2020, 12:30am

Wow! That explains everything! Thanks

Topic		Replies	Views
Speeding Up Function with ::Function and ::Vararg input Performance	17	879	February 19, 2021
Computing tuple replacements New to Julia tuple	18	1650	October 21, 2021
Broadcasting `setindex!` over a tuple of arrays with splatted indices is slow General Usage	1	645	March 11, 2018
Tuple indexing taking time? Performance	25	948	April 1, 2021
Better way to improve a function with splat operator used in the input Performance	15	632	April 1, 2023

Vararg vs Tuple{Vararg} performance

Related topics