Why is the splat operator slow?

eliascarv · March 15, 2022, 8:40pm

I was doing some benchmarks and got these results:

julia> using BenchmarkTools

julia> f1(itr) = collect(itr)
f1 (generic function with 1 method)

julia> f2(itr) = [itr...]
f2 (generic function with 1 method)

julia> gen = (x for x in 1:1000)
Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 1:1000)

julia> @btime f1($gen);
  910.000 ns (1 allocation: 7.94 KiB)

julia> @btime f2($gen);
  80.200 μs (1988 allocations: 84.44 KiB)

Why is the splat operator slower than the collect function in this case?

mbauman · March 15, 2022, 9:02pm

The biggest key concept is that Julia specializes function calls based upon the types and number of arguments. And the number of elements in a generator (or array) is unknown by the type system. If you don’t know the number of arguments, that’s a type instability.

Splatting (small) tuples or static arrays is fast. There’s another technical difference in how the values are passed — and passing lots of arguments is similarly suboptimal.

feanor12 · March 15, 2022, 9:10pm

Is it possible to specialize or rewrite just the case of [gen...] for a type stable generator with a constant return type?

What function is actually specialized when using this statement?

Benny · March 15, 2022, 9:50pm

It looks like they do very different things, function collect(itr::Generator) #etc vs vect(X::T...) where {T} = T[ X[i] for i = 1:length(X) ] in array.jl.

It seems fairly feasible to use 1 allocation, the type seems knowable so you could just allocate an array for that type, iterate the generator, and push elements onto the array. So I’m more interested in why this doesn’t happen for Base.vect. I can accept that it’s less efficient to make the tuple X and to index X[i] instead of iterate, but I don’t know why there are ~2k allocations.

EDIT: I thought it might have something to do with how Vararg is not specialized automatically; Base.vect does have a type parameter but not for the size N shown in the performance tips, so I tried myvect(X::Vararg{T, N}) where {T,N} = T[ X[i] for i = 1:N ]. Because myvect does specialize over the size N, it compiles (slower, more allocations) on the first call for each generator with a different size, but the 2nd run onwards has the same performance and ~2k allocations as Base.vect. Base.vect is far preferable because it doesn’t have to compile for each size. I also tried splatcollect(X::T...) where T = collect(X) and it performs the same, so the splatting is definitely the root.

Topic		Replies	Views
Why is there a massive difference between collect and [range...]? New to Julia performance	7	610	November 19, 2022
Generator slow when passed as argument Performance question , performance , generator , map	6	487	December 20, 2022
Better way to improve a function with splat operator used in the input Performance	15	620	April 1, 2023
Splatting arguments causes ~30x slow down Performance development	10	6283	November 1, 2018
Question about allocations and generators Performance question , memory-allocation , generator	2	748	June 11, 2021

Why is the splat operator slow?

Related topics