Why is the splat operator slow?

I was doing some benchmarks and got these results:

julia> using BenchmarkTools

julia> f1(itr) = collect(itr)
f1 (generic function with 1 method)

julia> f2(itr) = [itr...]
f2 (generic function with 1 method)

julia> gen = (x for x in 1:1000)
Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 1:1000)

julia> @btime f1($gen);
  910.000 ns (1 allocation: 7.94 KiB)

julia> @btime f2($gen);
  80.200 μs (1988 allocations: 84.44 KiB)

Why is the splat operator slower than the collect function in this case?

2 Likes

The biggest key concept is that Julia specializes function calls based upon the types and number of arguments. And the number of elements in a generator (or array) is unknown by the type system. If you don’t know the number of arguments, that’s a type instability.

Splatting (small) tuples or static arrays is fast. There’s another technical difference in how the values are passed — and passing lots of arguments is similarly suboptimal.

5 Likes

Is it possible to specialize or rewrite just the case of [gen...] for a type stable generator with a constant return type?

What function is actually specialized when using this statement?

1 Like

It looks like they do very different things, function collect(itr::Generator) #etc vs vect(X::T...) where {T} = T[ X[i] for i = 1:length(X) ] in array.jl.

It seems fairly feasible to use 1 allocation, the type seems knowable so you could just allocate an array for that type, iterate the generator, and push elements onto the array. So I’m more interested in why this doesn’t happen for Base.vect. I can accept that it’s less efficient to make the tuple X and to index X[i] instead of iterate, but I don’t know why there are ~2k allocations.

EDIT: I thought it might have something to do with how Vararg is not specialized automatically; Base.vect does have a type parameter but not for the size N shown in the performance tips, so I tried myvect(X::Vararg{T, N}) where {T,N} = T[ X[i] for i = 1:N ]. Because myvect does specialize over the size N, it compiles (slower, more allocations) on the first call for each generator with a different size, but the 2nd run onwards has the same performance and ~2k allocations as Base.vect. Base.vect is far preferable because it doesn’t have to compile for each size. I also tried splatcollect(X::T...) where T = collect(X) and it performs the same, so the splatting is definitely the root.

2 Likes