Broadcasting `setindex!` over a tuple of arrays with splatted indices is slow

For demonstration of this problem, create a tuple of two matrices, one with Float64 and the other with Int64 as entries:

julia> VERSION

julia> mats = (rand(3,3), rand(Int64,3,3));

Also create a tuple of Float64 and Int64, which we will set as the (1,1) entries of the above-created two matrices:

julia> vals = (0.0, 0);

Now, let’s set the entries by broadcasting setindex! over the tuples. The performance is pretty good with only one allocation:

julia> using BenchmarkTools

julia> @btime setindex!.($mats, $vals, 1, 1);
  10.381 ns (1 allocation: 32 bytes)

However, if if I pass the indices as a splatted tuple, suddenly the performance degrades significantly with 10 allocations:

julia> @btime setindex!.($mats, $vals, (1,1)...);
  394.860 ns (10 allocations: 288 bytes)

Why is this happening?

Note 1. This performance degradation does not happen if mats is a tuple of matrices of the same eltype:

julia> mats = (rand(3,3), rand(3,3));

julia> vals = (0.0, 0.0);

julia> @btime setindex!.($mats, $vals, (1,1)...);
  10.921 ns (1 allocation: 32 bytes)

Because the performance degradation happens when mats is a tuple of inhomogeneous types, I guess this problem has the same origin as this issue. However, then I’m not sure why the splat matters here.

Note 2. The situation is not much different in Julia 0.7.


This is because the base broadcast implementation for combinations of heterogeneous tuples and scalars is type-unstable:

julia> @code_warntype broadcast(+, (1.,1), 1)
  end::Tuple{Union{Float64, Int64},Union{Float64, Int64}}

The implementation with two tuples of the same length is easier — that’s just map which has a carefully constructed implementation to remain type-stable:

julia> @code_warntype broadcast(+, (1.,1), (1,1))

It’s currently hard to iteratively construct tuples of heterogenous types in a way that inference can follow.

Here’s how I got here: Often in debugging these sorts of things I find it helpful to use little function wrappers. Sometimes BenchmarkTools is interacting with a global scope in a way that I don’t expect. That’s not the case here, but they’re still helpful in seeing why they are different:

julia> f(mats, vals) = setindex!.(mats, vals, 1, 1)
       g(mats, vals) = setindex!.(mats, vals, (1,1)...)
g (generic function with 1 method)

julia> @btime f($mats, $vals);
  7.185 ns (1 allocation: 32 bytes)

julia> @btime g($mats, $vals);
  293.312 ns (6 allocations: 176 bytes)

So now you can also do @code_warntype on these guys to see that g(mats,vals)::Tuple{Union{…},Union{…}} while f is a type-stable Tuple{Array{Float64,2},Array{Int64,2}}.

The splatting is a red herring: the inference is different not because of the splatting, but because setindex!.(a, b, 1, 1) actually lowers to broadcast((a,b)->setindex!(a, b, 1, 1), a, b) — the numeric literals become a part of the function! Try:

julia> h(mats, vals, x, y) = setindex!.(mats, vals, x, y)
h (generic function with 1 method)

julia> @btime h($mats, $vals, 1, 1);
  298.037 ns (6 allocations: 176 bytes)

So now the difference isn’t in splatting, but rather it’s which arguments effectively get passed to broadcast.