I think I figured out why it’s so slow - I wasn’t being careful with tuples and arrays.
In the original splatted loop code, I’m splatting the X_entry, Y_entry arguments, which (I guess) converts from an Array to a Tuple.
function foo_loop_splat!(N,X,Y,Xi,Yi)
for i = 1:N
for fld in eachindex(X)
Xi[fld] = X[fld][i]
Yi[fld] = Y[fld][i]
end
foo(Xi...,Yi...)
end
end
An old post by @rdeits on Array of tuples - #7 by rdeits noted that tuples have much less overhead compared to arrays. If I replace the for loop with the (simpler) tuple-based version
function foo_loop_splat!(N,X,Y)
for i = 1:N
foo(getindex.(X,i)...,getindex.(Y,i)...)
end
end
then the non-splatted and splatted for loop give identical runtimes
@btime foo_loop($N,$a,$b,$c,$d)
@btime foo_loop_splat!($N,$X,$Y)
1.049 ms (0 allocations: 0 bytes)
1.093 ms (0 allocations: 0 bytes)