Why are for loops slower than broadcast with splatted arguments?

I think I figured out why it’s so slow - I wasn’t being careful with tuples and arrays.

In the original splatted loop code, I’m splatting the X_entry, Y_entry arguments, which (I guess) converts from an Array to a Tuple.

function foo_loop_splat!(N,X,Y,Xi,Yi)
    for i = 1:N
        for fld in eachindex(X)
            Xi[fld] = X[fld][i]
            Yi[fld] = Y[fld][i]
        end
        foo(Xi...,Yi...)
    end
end

An old post by @rdeits on Array of tuples - #7 by rdeits noted that tuples have much less overhead compared to arrays. If I replace the for loop with the (simpler) tuple-based version

function foo_loop_splat!(N,X,Y)
    for i = 1:N
        foo(getindex.(X,i)...,getindex.(Y,i)...)
    end
end

then the non-splatted and splatted for loop give identical runtimes

@btime foo_loop($N,$a,$b,$c,$d)
@btime foo_loop_splat!($N,$X,$Y)
  1.049 ms (0 allocations: 0 bytes)
  1.093 ms (0 allocations: 0 bytes)
1 Like