Normal vs broadcasted slice assignment

This must have been discussed a dozen times but I couldn’t find a thread about this precise issue:

using BenchmarkTools

f1(v, x) = v[1:length(x)] = x
f2(v, x) = v[1:length(x)] .= x

julia> @btime f1($(rand(1000)), $(rand(100)));
  10.268 ns (0 allocations: 0 bytes)

julia> @btime f2($(rand(1000)), $(rand(100)));
  25.415 ns (0 allocations: 0 bytes)

Is this expected? Does it have to do with unaliasing? I wonder what’s causing the slowdown exactly since there’s 0 allocation in both cases.

1 Like

No, I think this is just the difference between a highly specialized memcpy that hits the Vector’s memory directly and a hand-written for loop that works with all abstract arrays.

2 Likes

These both should turn into memcpy. IMO this is unexpected.

So the difference is that f1 turns into a copyto!(view(a, 1:100), b) while f2 turns into a setindex!. So the problem is just that we don’t have an optimized method for copying a view of an Array to another Array.

Another difference is that f2 returns a view of v, while f1 returns x. Changing both functions to return nothing improves the performance of f2 somewhat, although not enough to make up the difference.

1 Like

In

Chris Elrod had suggested that the differences arise from non-temporal stores, and that LoopVectorization provides a Julia equivalent.

1 Like